The DNA of AI: A Story of Optimization

Published in

Analytics Vidhya

19 min readAug 7, 2021

Abstract

We used Kerbal Space Program to validate an optimization technique. Specifically, this paper focuses on computation of a stochastic gradient vector approximation best suited for situations with high noise and/or computational cost in the scoring function. This technique, called dipolar crystal gradient synthesis, uses a backpropagation approach borrowed from multi-level perceptrons to inform a multivariate finite difference computation. Dipolar variants are generated along a subset of weighted axes, where the axis weights are adjusted to favor dimensions whose changes contribute most significantly to the gradient. We applied this technique to optimize various parameters in KSP vessels, notably the AI settings and airframe configuration. Variants were evaluated for performance in a combat environment. Combat performance informed the backpropagation process.

Overview

This is going to sound a little weird…

At least, that’s the goal. It’s hard to imagine how anyone could understand the complexities of n-dimensional data structures; harder still to believe anyone could ever hope to navigate such a diverse landscape. Most of us need adorable YouTube videos to explain things like astrophysics using hilarious cartoons. Even with all the clever packaging and skilled explanation, it can be hard to see how n-dimensional things could be connected to anything in our daily lives. Like many other parts of our lives, it helps to start with something familiar and move toward a more nuanced understanding. This will indeed be a metaphor for our journey itself.

Prelude Scenarios

Before we embark, let’s take a few moments to review a few scenarios in lower dimensional environments. This practice can help set the stage for better appreciation of the complexities later. And we’ll need all the help we can get.

1D

Imagine you have been selected for a radio spot, where you share your favorite song, but there’s a catch. The radio station says your song must be exactly four minutes long, or they reject your submission. You have alphabetized your record collection, but this means all the song lengths are random. There’s another catch — you only have ten minutes to find a track. There’s no hope of checking each record for a matching song before your entry is due. You frantically scan through your albums, reading the label on each until you find one with a value close to the target. You set it aside as the “best so far” candidate and continue, maybe randomly, maybe in order. When you feel like you’re running out of time, you submit your candidate.

2D

Imagine yourself standing on a street corner on a foggy day in a time period before electricity or the internet. You have been hired by the city planner to survey the nearby landscape for a building location with a view. Your results are expected by the end of day. On a clear day, this would be as simple as scanning the horizon for hills or buildings crossing the ground plane, then moving toward an intersection point. However, with limited visibility you are constrained to look only one block in any given direction.

At each intersection, you scan the adjacent intersections for roads leading uphill from your current location. From the available options, you select the most steep path and walk along it to the next intersection. You can repeat this process until all paths lead downhill. If you’re lucky, the resulting location is likely to have a favorable view to satisfy your employer. If you’re unlucky, you discover a small hill and stop there, declaring victory. This is known as a local maximum.

3D

You are building a machine to measure precisely the surface geometry of some arbitrary part in a clamp. Your machine consists of an ultra-sensitive needle with a tiny rounded tip. Let’s say the apparatus can measure very precise deflections of the needle in all three dimensions in Euclidean space. You can move the table in the plane perpendicular to the needle, and you can move the needle toward or away from the part along the needle’s axis. For simplicity, let’s say the part you want to scan is known to be convex, since concave parts are more complicated.

In order to calibrate the machine for optimization, we need to establish the rectangular bounds of the part. Fortunately for you, the machine also has a camera aiming downward toward the part, allowing the machine to determine the planform bounding rectangle, as well as a curved path representing the outer perimeter of the part. Using this, the machine can perform a similar process as you followed in the 2D example, stepping in small increments in various directions attempting to understand the curvature of the part near the needle tip, moving upward toward the highest point. This is called a hill climb algorithm.

The primary difference in this example is the explicit input of needle height. In the 2D example, the altitude of the location is used as the value function, and we adjusted the latitude and longitude as inputs. Here, we use the deflection of the needle as a signal of topology gradient, which is our value function.

4D

You are designing a robot for a battle competition. Your design allows for various parameters of your robot’s control system to be adjusted, but you don’t know what the “best” settings are. Let’s say you can adjust the turning rate, max wheel torque, trigger distance, and weapon power. It seems reasonable to expect different values of these parameters to affect the robot’s battle performance. Maybe your assumptions about its capabilities are wrong also.

Fortunately, you built ten prototypes, so you have enough units to do some combat trials to test various settings before the competition. However, with four dimensions to consider, you would need 81 units to study all the combinations of some center value plus or minus an offset. Maybe there’s a way to configure a small number of units to glean an optimum set of parameters by evaluating the smaller set. This is where the journey really begins.

Math with Pictures

I, for one, love storytelling. There’s something compelling about connecting to a story emotionally and intellectually, maybe even spiritually. It helps us make sense of otherwise complex things. In this section, we’ll build on the scenarios from the first section to tell stories about the process of optimizing in the situations presented therein. And, of course, we’ll have some pictures to guide us along the way. First, we need to review some terminology, since this gets into the weeds pretty quickly.

Terminology

Dipole — a pair of vectors, equal in magnitude and opposite in direction to each other.

Crystal — a convex spatial geometric shape composed of vertices connected by edges, represented by a set of vectors starting at the same point, each ending at a vertex.

Weighted Centroid — the weighted center of a geometric shape, representing the weighted mean of all the vertices.

Find the Ancient Buried Building (2D)

For the purposes of visualization and simplicity, let’s consider a 2D example. This way, we can draw pretty pictures and make the math more approachable. The concepts are the same regardless of how many dimensions we use. Two will do nicely.

Let’s take an example from real history. Around two thousand years ago, the city of Pompeii was buried in volcanic rock as it fell from the sky during the eruption of Mt Vesuvius. We now know many of the buildings survived, slowly covered by dirt as nature’s wrath subsided. No humans dared approach the uninhabitable land for many years. Hundreds of years later, it was discovered when a farmer decided to dig into the top of a hill and found himself looking down into a room in an ancient building. Knowing the tops of hills probably include some ancient ruins, let’s try to recreate their struggle and see how we could improve it with math.

The simplest case for illustration is a smooth cone shaped hill. We’ll say it’s obscured by fog or clouds, so we can’t just use the horizon scanning technique to find higher ground. Depending on our starting point, we may see a different topology and a different uphill direction to go. The following figure illustrates this:

Uphill directions from various starting points

A Path Forward

Each of the diamonds in Figure 1 represents a starting point with a crystal centered around it. At each center point, we compute a score for the function — in this case altitude — and we compare the vertex scores against the center score to determine a local gradient for this location. This is a technique called multivariate finite difference, a form of Newton’s method for numerical approximation. Since Newton developed his technique over 500 years ago, it’s not exactly new. However, we will make great use of this technique in service of finding our optimized solution. Here, we consider the inclusion of backpropagation techniques, borrowed from multi-level perceptrons in neural networks. By measuring the contribution of each axis individually (in the form of dipole pairs) relative to some known reference point, we can further inform the axial selection process. This is a little like buying flowers. If you get mediocre flowers from the same vendor consistently, you probably won’t buy from this vendor in the future unless maybe the prices change in your favor. In order to appreciate this better, we’ll need to consider a different example, and we’ll need to make sure we have enough axes to begin to see the problem more clearly.

The key constraint here is time required to compute the score at each location. With the Pompeii discovery example, we would need to perform some digging experiment at each location site to discover anything meaningful. We certainly can’t dig up the entire county without making a big mess and costing a fortune. Instead, we choose points to make tactical measurements to inform our decision making process. Before we go planning excavation projects, let’s take a big step back and look at the bigger picture for a moment. Let’s use the combat robot metaphor from here on, so we can start putting the pieces together in our minds.

The simple example described earlier involved just four parameters to manage in our combat robot. This was a small subset of all the parameters we would realistically need to adjust in a meaningful combat environment. We may have motion controllers for individual subsystems with their own settings isolated from other subsystems. This typically manifests as decoupled clusters of coupled parameters in the system at large. For our optimizer, small clusters are ideal, and decoupled clusters are beneficial.

Combat Genes

At a macroscopic level, we can fairly easily understand clusters of parameters as functional groupings. One common example of this is weapon manager settings, often manifesting in the form of target priorities. These parameters may have significant impact on the combat performance, while being fully decoupled from other groups of parameters, such as pilot AI settings. This means we can try variants of target priorities without adjusting the pilot settings and have some confidence we don’t need to re-evaluate pilot settings later to compensate.

Now we know there are some parameters which might affect the score in clusters or groups, but we don’t know anything about the groups themselves. We don’t strictly need to know anything about the system in advance to test hypotheses and learn along the way. Still, it would be nice to have some kind of compass to chart a path. We’ll come back to this later. For now, let’s focus on the ecosystem itself. What defines our desired system behavior? What are the contributing factors?

For our purposes, a vessel in Kerbal Space Program (KSP) is a collection of discrete parts and modules, each responsible for defining some behavior of the vehicle. Without any parts with control surface modules, the pilot has no way of controlling the vehicle. In this case, we expect combat performance to be very very poor. Players design KSP craft in community competitions to satisfy all sorts of diverse goals. As such, we can expect a very rich genetic diversity in the craft players create. Similarly, these craft will all present combat performance commensurate with the creative diversity of the players. This represents an excellent opportunity to study genetic evolution through a digital lens.

All players start with the default values for all parameters assigned to parts and modules. Each player is allowed to cobble together any arbitrary set of parts, and these parts together represent a vessel configuration designed by a player, often for a specific set of competition goals. They may also optionally adjust parameters, such as steering power, evasion time, etc. These parameters, along with a unique configuration of airframe geometry and control surface settings, encode the essence of a unique individual within our ecosystem. Changing the values of any of these parameters may result in some change in the craft performance, or it may not.

Drunken Boat of Progress

At this point, we have a basic idea of how various parameters might contribute collectively to a vessel’s combat performance. We may even be able to make some educated guesses about their clustering and how they interact with each other. Some will be nuanced and sensitive, while others are less impactful. Moreover, there’s no guarantee of continuity when applying the resulting model from optimizing one design on any other design. In machine learning terms, our model for deciding which subset of parameters to adjust doesn’t generalize well. As a result, we’re relegated to starting over for each vessel and hoping our model collapses quickly. Before we move into more details, let’s take just a moment to reflect on the challenges of defining a model we can use in the general sense.

It may be easy to say any high performance aircraft must have certain qualities in order to succeed against its peers. Things like control surfaces for roll, pitch, and yaw are indeed vital to surviving the competitive environment of air superiority. While it’s plausible to find a solution without one of these components, such a solution would be very far from competitive. Additionally, there are limitless reasons for introducing new features. Maybe the key to victory is one tiny control surface in a strange place, like whiskers on a catfish. Maybe it’s the sum total influence of having several different specific traits at the same time.

With all the myriad configurations of control surfaces and other contributing parameters, our best hope of defining any general model for optimization is to rely on common attributes where this diversity has little effect. In this sense, we must limit our general model to the lowest common denominator across all parameters we wish to adjust. The other parameters may still play a significant role in the optimization process, but it’s important to isolate our weight assignments into two categories — those we can potentially generalize and those we must isolate to a given airframe geometry.

Putting It All Together

In the abstract, we started with a technical description of the process, so here’s where we break it down, so we can package it all together. This process is best described as:

Stochastic — rather than compute a dipole pair for every axis in the space, which could be prohibitively expensive, we choose a small number of axes based on axial contribution weights from previous iterations and iterate frequently.
Backpropagating — score contributions from each axis are used to increase or decrease the axis weight after each iteration; axes with higher weights are selected over those with lower weights in subsequent iterations.
Finite difference — points in the vicinity of a reference point are sampled and used to compute an approximation of the local gradient vector.
Non-differentiable — while there may be a smooth function governing the system behavior, it is not known to us; we must instead select points nearby a known point and evaluate their score individually.
Gradient ascent — rather than converging on a minimum score, based on some known error metric, there is no ideal score to be achieved, so we seek simply to maximize the value with ever-higher scores.

Some Notes on Scoring

We’ve discussed scores throughout this journey, but not in any concrete details. The score is a fundamentally intrinsic component of any optimization process. Without a mechanism of feedback, it is impossible to chart a course forward expecting anything other than random shambling. In the context of aircraft combat scores, this analysis uses a fixed metric to compute performance on an absolute scale. This is primarily measured by hit rate and damage rate — the number of hits per second or damage inflicted per second, averaged over the duration of the heat in a free-for-all combat group. This has many implications on the value of scores, as results can vary significantly from heat to heat.

This entire effort is made in service of understanding the interplay between organic genetic mutation processes occurring naturally in the wild and their digital counterparts. We may find the best way to evolve is indeed by combining individuals following the traditional parent/child genetic evolution model. For now, we focus on the gradient synthesis piece.

The Primary Iteration Cycle

With each iteration, the craft is expected to improve, as each variant informs the path forward with its contribution. This section focuses on the steps taken during each cycle. We’ll describe each step in detail after first reviewing the overall outline.

The cycle works like this:

Start with a reference craft.
Select the top X parametric axes to mutate, sorted by axis weight. The axes are stored in a map associated with weight factors.
Generate dipole variants (two for each selected axis) for each axis.
Select at least one random adversary per dipole pair.
Run N heats of combat including the reference, variants, and adversaries.
Aggregate scores from variants and the reference.
Normalize variant scores relative to the max variant score.
Compute weighted centroid of the crystal formed by the dipoles using the normalized scores from combat.
Generate a new reference using the centroid values.
Adjust axis weights based on relative contribution.
Repeat.

Detailed Analysis

Reference Craft

We start our journey with a KSP craft file, which is a collection of parts represented in a tree structure. Each part in the tree has some physical meaning in KSP’s virtual universe. Some store fuel. Others act as lift or control surfaces for the aircraft. Our algorithm analyzes the given file to identify all parts responsible for contributing in some way to the vehicle’s control inputs. These will be our axes used in the optimization process.

If we’re iterating an existing cycle, we already have a weight map to work with. If we’re just starting the process, we need to initialize the weight map. Once we’ve identified all the axes we can adjust in the given craft file, we generate weight factors for each axis. The default value is 1.0, and we add a tiny randomization factor during initialization to reduce design bias.

Axis Selection

Each iteration involves the selection of axes to consider for refinement. This is achieved by sorting a weighted map of axes and selecting some highly ranked axes. The specific number of axes will likely be informed by the cost of computing a score for each variant. For the purposes of this study, we used the best three axes in the evolution process.

Dipole Synthesis

Once axes are selected, we simply generate a pair of variants representing our axial dipole. In this study, we use a fixed offset of ±10%, hoping for a balance between fast convergence and maintaining linearity. Future studies may consider experimenting with different values.

Adversary Injection

Each iteration includes a consideration for heterogeneity. To reduce the effects of inbreeding, we inject random adversaries in each cycle. Over a sufficient number of iterations, this is designed to reduce or eliminate divergence or stagnation due to lack of genetic diversity. Adversaries were added specifically to counter this effect, as it is believed to contribute to the low or negligible growth trend of previous experiments.

Data Collection

Due to the relatively high noise factor in the KSP combat simulation environment, we run multiple heats for each iteration cycle. This helps to reduce bias from initial conditions, since starting and order location are randomized. We run several free-for-all tournament heats and aggregate the scores. For the purposes of this study, we ran 5 heats per iteration. We’re also using a fixed heat duration of 3mins. This means we need 15mins to produce a score for a single iteration. Heats could be run in parallel to reduce total wall clock time, but this experiment ran everything serially.

An example heat roster might look like this:

R1 — Reference seed craft
V1 — Positive variant for axis A
V2 — Negative variant for axis A
A1 — Antagonist chosen randomly
V3 — Positive variant for axis B
V4 — Negative variant for axis B
A2 — Antagonist chosen randomly

Score Aggregation

Scores from all the iteration heats are aggregated together to produce a statistically meaningful result. This result is computed for each variant separately and used to inform the gradient synthesis below. Adversary scores are not included in this consideration, but the reference score is retained.

Normalization

Variant scores are normalized against the maximum score within the set (excluding the reference). If the max variant score is lower than the reference score, the iteration is considered a bad seed. All variants are discarded for bad seeds. For cases where the max score exceeds the reference, we proceed to computing a gradient vector from the dipolar crystal.

Gradient Synthesis

With scores for each vertex in the crystal, each representing the cumulative outcome from multiple heats of combat for a given variant, we now have sufficient confidence in the magnitude of contributions made by each axis represented in the crystal. We can compute unit vectors for each axis, representing the axial component of the gradient vector.

Step Forward

Adding up all the contributions of each axis into a resulting gradient vector, we simply add the result to our reference point to generate a new reference craft for the next iteration cycle. Before we can continue to the next iteration, we need to provide critical feedback to the system about how significant the contribution from each axis was relative to the others.

Backpropagation

We use the scores from each end of the dipole to decide how important this axis is relative to the reference point. After all, we’re surfing n-dimensional curves here. What’s good for the goose might only be good for the gander on Thursdays when it’s raining and the neighbor’s cat is asleep. The feedback we give on Monday might be very different. In the end, this is the mother of all buried leads. It turns out the backpropagation process is the most important part.

Results!

In total, this experiment required nearly a week of computation time. We compared the results from five different KSP craft over fifty iterations. In each case, we have two phases — generation and validation. Here, we quickly review the process used to synthesize and evaluate the results.

Methodology

Phase One — Generation

For the first phase, we start with an initial seed craft and run an iteration on it, as described above. This results in an approximate gradient vector result, which is used to generate a new seed for the next iteration. It is possible for an iteration to produce bad results. In this case, we simply downvote the axes selected in the iteration and retry. This continues until we have fifty successful iterations.

Phase Two — Validation

Once we have a group of seeds from our generation phase, we can begin to validate the seeds, comparing them against each other and all the antagonists used in the generation phase. We select groups of six seeds (in order by iteration) and run them against the six antagonists. This allows the scores to be compared against each other without a need for normalization. With the same number of craft in each validation heat, the scoring opportunities are expected to be uniform. We run tournaments of 12-player free-for-all for each group of six seeds. The results of these tournaments are plotted linearly for each craft series.

Additionally, we include a tournament of the pre- and post-evolution craft in a 10-player free-for-all. This tournament runs all the craft together over forty rounds to produce a stable ranking of the players.

Interpreting the Charts

Overlaying all the validation results together in one chart, we see the general improvement trend for the lower ranked craft. This chart shows the scores from all the validation tournaments on the vertical axis for iterations represented on the horizontal axis. If the technique is working, we should see improvements in score as the iteration count increases (up and to the right). We see an inverse relationship in the top ranked craft. As noted below, this is believed to be a consequence of the small number of heats per iteration used throughout the duration of this experiment.

The pre- and post-optimization comparison chart tells a different story, a testament to the noisy system of aircraft combat. We see improvements in all the craft, except the worst performer. Another key consideration here is in the primary actors of the system. We expect things like turning rate to be strong predictors of air combat performance. This experiment intentionally chose a diverse set of aircraft with different armaments and airframes. It is entirely possible fifty iterations is not enough to improve a craft with a poorly performing initial configuration. Similarly, we may need one hundred or more rounds of the final validation tournament to have high confidence in its stability.

Conclusions

The stochastic nature of this technique is key to its success. When the cost of evaluating a high fidelity gradient in all dimensions is too high, there is great benefit to selecting a subset of axes and computing an approximation of the gradient. Combining this sampling approach with the backpropagation leads to a highly adaptive solution with less sensitivity to local minima/maxima. As the gradient wanders into new regions of the state space, the algorithm naturally accounts for changes in the sensitivity along various axes. Data received during each iteration cycle informs the best guess for the axes to select in the subsequent iterations. This seems especially true for saddle points, where one axis is optimized and another is not. More study is needed to understand better these nuances.

Future Study

Fairly early in the data collection for this experiment, it became clear the backpropagation algorithm we used is not very stable. It results in a clear improvement trend for some craft, but some of the data are inconsistent. It makes the top two craft worse in seed validation, despite improved scores in tournament validation. This seems likely due to the low number of heats used per iteration. With the inherent noise in the scoring function, we may need significantly more heats to develop a statistically significant gradient approximation.

The specific backpropagation logic used in this experiment leads to some chaos in the weighting, rather than the stable smooth weighting we hoped to achieve. Subsequent papers may address this in more detail. Additionally, there are many fixed constants used for adjusting weights. This represents a strong opportunity for modeling and/or predicting appropriate weights for future iterations.

Special Thanks!

First, I want to wish a huge thank you to the players who allowed me to use their beautiful creations in my weird science experiment. These are players from the KSP/BDArmory community. They have been a tremendous support through the months of preparation and development culminating in the work you see here.

Also, thanks to my good friends Dan Post and Aaron Bodoh-Creed for agreeing to read my rambling early drafts. Their feedback helped me organize the chaos and present these complex concepts in a hopefully approachable way.

And, as always, thank you for reading! ❤