Game Theory-based Parameter Tuning for Path Planning of UAVs

9 min readSep 14, 2021

In this blog, I will be discussing my recent research work on using a game theory-based framework for tuning the parameters involved in the path planning algorithms for drones. This work was presented at VLSI Design 2021.

Let’s start.

1. Introduction

Recently, there has been an upsurge in the global UAV market. Today drones are being used almost everywhere ranging from military, agriculture, security, rescue, package delivery, disaster relief, journalism, recreation, construction, and many more. As an estimate, the global UAV market that was valued at $2.67 billion in 2016 is expected to reach $10.28 billion by 2022.

Despite such progress, the software and computing aspects of drones have not been given adequate importance. Consider the path planning algorithms in UAVs (explained in detail in Background). There are multiple parameters at play that decide the path length and the hover time of a drone. It has been shown that with the wrong choice of these parameters, it is possible to compute routes that are 4X longer and it takes a larger time to traverse these routes. This leads to unwanted energy consumption, which is undesirable in battery-constrained devices such as UAVs. Aside from longer routes, a larger hover duration is also undesirable. During hovering, the drone is not doing any useful work, it is waiting for its compute unit to calculate the next node to traverse. Hence, such larger routes are not desirable.

Thus, it becomes very important to tune these parameters efficiently such that the hover duration is reduced and the path lengths are as short as possible. Traditional approaches to tune these parameters involve either manual tuning or solving optimization problems to get a suitable parameter configuration. In this work, we show that solving an optimization problem with a large search space is not a feasible option for the real-time operation of these cyber-physical systems. Instead by solving the optimization problem approximately, we can achieve a real-time operation. We use game theory as a tool to solve the optimization problem quickly.

2. Background

We provide a simplistic explanation of RRT* here. For a detailed explanation, please read the paper. Figure 2 shows the steps involved in the RRT* path planning algorithm in the form of animation. RRT* builds the path in the form of a tree. The black edges and grey nodes show the initial tree. The blue triangles show the obstacles in the search space. The maroon circles show the random samples in the search space.

The algorithm first samples the environment (dimension: x * y * z) and chooses a random sample in the direction where it wants the tree to grow. Subsequently, the nearest node (from the existing tree) to the random sample is chosen. This nearest node attempts to create a new node in the direction of the random sample. There is a maximum distance (step-size also called resolution) that restricts the distance at which the new node can be placed. Hence, as can be seen in the animation, the new node is placed at a step size distance from the nearest node of the tree in the direction of the random sample. Subsequently, the node and the edge that will connect it to the nearest node are checked for the obstacle avoidance distance. The idea is that the new node and its edge should not violate a minimum distance from the nearby obstacles. As can be seen in the animation, the first new node violates the obstacle avoidance distance, hence it is rejected and we move on to find another new node.

Thus, the number of samples, dimension of the search space, obstacle avoidance distance, and the step size form the set of tunable parameters that decide the flight statistics of a drone. Apart from these tunable parameters, obstacle density or the number of obstacles in the search space play a crucial role in the path planning process.

3. Key Insights

Figure 3 shows the key insight of our work. As mentioned in the introduction, we solve the optimization problem by converting it into a game-theoretic framework. The basic idea is to consider the variables in the optimization problem as players of the game and split and map the optimization objective to the payoffs of different players.

Let us elaborate a bit. The first idea is very clear to map the variables to the players of the game. However, splitting the optimization problem to map to the payoffs is a complicated problem and requires certain mathematical tools to do so. Instead, a better idea is to perform sensitivity analyses of the flight data with respect to the tunable parameters to get the payoff equations. We follow this approach.

4. Optimization Problem

In order to formulate the optimization problem, we need to collect huge amounts of flight data for different values of the tunable parameters to be able to deduce a master equation for the optimization objective. We perform extensive simulations in the Gazebo simulator to collect this data. Figure 4 shows the steps involved in the formulation of the optimization problem.

Figure 4: Formulation of the optimization problem

We basically simulate the drone path planning algorithm in the Gazebo simulator for different virtual worlds and different values of the tunable parameters. We collect the flight data: hover time, and path length, for each of these simulations.

Figure 5 shows the format of the data points collected from the Gazebo simulations. The left column shows the feature vectors and the right column shows the labels for each data point or the quantity that is to be predicted. We subsequently use ML-based models to fit these data points. This basically translates to a multi-output regression problem because we need to predict two quantities simultaneously. Figure 6 shows the root mean square error (RMSE) of different ML models for two different virtual worlds. RMSE is converted to a percentage with respect to the true value. We find that MLP with Sigmoid activation achieves the best accuracy and hence we use it to formulate the optimization problem.

Figure 6: Root mean square error for two different virtual worlds. MLP with Sigmoid performs the best.

Objective Function:
minimize Total time = Hover time + Flight time
= hover time + λ*path length = HT + λ* PL
Constraints:
1. Analytical equations from MLP
2. Feasibility constraints on the parameters from the papers and real-world experiments.

The final equations are shown in the paper.

5. Game-theoretic Formulation

Figure 7 shows the steps involved in the game-theoretic formulation. We collect the equations from drone physics and the results from the sensitivity analyses to formulate the payoff equations of the players. Subsequently, the formulated game is solved for Nash Equilibrium using the Gambit solver.

All the tunable parameters become the players. The obstacle density becomes the input and comes with a negative sign in the formulation of the payoff equations because both the path length and the hover time increase with an increase in the obstacle density as shown in Figure 8. The strategies of the players are the feasible ranges of the respective tunable parameters. The flight data: hover time and path length, are the dependent data and play a role in the payoff formulation. The sign with which they appear in the payoff of each player is obtained from Figure 9.

5.1 Designing the Payoffs

Each player’s payoff consists of two objectives: altruistic and selfish objectives. The altruistic objective aims at minimizing the wasted energy as a result of the hover time or the sub-optimal path length. The selfish objective is proportional to the individual parameters. Maximizing the payoff is equal to minimizing the altruistic objective and maximizing the selfish objective. To formulate the altruistic objective, we need to know the relationship of the hover time and the path length with each of the tunable parameters so that the corresponding term can be added to the payoff of that parameter. This requires sensitivity analyses. Figure 8 shows some of the sensitivity analyses graphs.

Figure 8: Sensitivity analyses of hover time and path length with respect to the obstacle density and the step size

Figure 9: The derivations from the sensitivity analyses

Figure 9 shows the derivations from the sensitivity analyses. Once we get an idea if HT and PL are increasing or decreasing with the increase in the value of a parameter, we are able to decide the sign of the altruistic objective in the payoff equation. The exact relationship (linear or quadratic) is known to some extent from the obtained graphs. Each of the payoff equations then consists of the selfish objective, the altruistic objective, and the effect of obstacle density as described above. The final equations are shown in the paper.

6. Results

The related work on parameter tuning for path planning algorithms takes 30 minutes to 2 hours to converge to a solution. They explore nearly 100 parameter configurations. In comparison, our game-theoretic approach takes 0.01 seconds to explore 256 configurations and provide a Nash Equilibrium.

We experiment with the three virtual worlds (world1, world2, and world3 from left to right) as shown in Figure 10. The obstacle density of the virtual world3 is twice that of the virtual world2.

Figure 10: Three different virtual worlds

Figure 11 shows the hover time and path lengths obtained using our optimization-based approach and the game theory-based approach.

Figure 11: HT and PL using optimization-based and game theory-based approach

We observe that the game-theoretic solutions are 5–21% better than those found by solving the optimization problem. This is mainly because the search space in the optimization problem had to be reduced due to convergence issues. Moreover, our game-theoretic solver is 10–100X faster than the optimization solver. We also observe that because the obstacle density of world3 is twice that of world2, the HT and PL values for world3 are more than that for world2.

7. Conclusion

In this work, we propose to solve a parameter tuning problem quickly using a game-theoretic approach as opposed to the traditionally used optimization-based approach. We show that game theory is a powerful tool to solve complex non-linear optimization problems. We show a formal approach to convert a single-objective optimization problem to a game using standard techniques. Our approach paves the way for real-time operation in a wide variety of optimization problems that arise in cyber-physical systems.

Thank you for reading. That’s all folks! Hope you liked the article :)

For any questions, feel free to read the full paper or direct your questions to my email id: diksha [dot] moolchandani [at] cse [dot] iitd [dot] ac [dot] in.