Unleashing the potential of AI for teaching drones: making advanced technologies accessible to all

Published in

Supercharge's Digital Product Guide

10 min readFeb 9, 2023

Have you ever wanted to tinker with artificial intelligence but felt intimidated by the math and technical aspects? Look no further! In this article, I’ll share my journey of using AI to tame a drone, all while avoiding the complex mathematics. Follow along as I demonstrate how it’s possible to utilize online resources to achieve impressive results with AI and drones.

In order to keep up with the pace of technology, we at Supercharge put extra effort into researching hot topics such as AI and IoT. Our teams get to experiment with AI and IoT-related technologies to get familiar with these future-proof topics. I decided to do so myself too.

What did my little experiment look like in the end?

Let‘s rewind a few steps. A friend of mine gave me a helping hand around the house and also brought his new toy, a DJI Mavic drone. I was obsessed with the shipped features: following objects, face recognition and all the AI-driven camera techniques.

At the same time (in 2019), our tech teams were being trained in AI technologies by a specialist through multiple half-a-day-long sessions. With these recent experiences in my mind, I decided to make my own face-following drone. I also set up a few criteria to make it even more challenging:

I do web frontend development on a daily basis, so I want to write JS only. No python this time. (Rapid prototyping AI-related app without Python seems to be stretching, but it is actually doable, you’ll see!)
I don’t want to understand the AI behind only to use AI as a service
On top of everything, I aim to finish within 4 hours

Since I had a 4-hour time cap for delivery, I needed to be prepared for any potential issues and wrong tracks. To do this, I conducted a quick research and found the following sources to be very useful:

TLDR; By following the docs and videos above, I understood how to control the drone over WiFi using JavaScript and how to obtain the video stream from it. I found this to be the most critical part, but the community had already solved it, so it was my turn to build upon their work!

So, after I spent a couple of hours researching (I haven’t done any code at this point), I put my plan together using a DJI Tello drone:

First of all, I need to establish a connection between the drone and my mac and take over the control of the drone (takeoff, move around on all axes, land)
Then somehow, fire up a JS based face recognition using my webcam first
In order to track the face position, I have to calculate the distance vectors on all three axes (how far the face is from the centre of the camera picture)
Replace the webcam video source with the drone’s camera
Control the drone programmatically using the distance vectors to follow the face

Let’s have a look at each step! I’ll post the corresponding changes from my Github repo.

Connecting to and controlling the drone

First, I tried to use Wes Bos’ code as a start. It soon turned out that I would need finer control over the drone than Wes’ code provides if I wanted smooth operations. Therefore, I switched to the more complex nodetello library. If you compare the two codebases, you’ll find that Wes Bos’ code uses the basic movement commands (up, down, left, right, forward, back which are very limited in terms of precision — 20 cm is the smallest step it can take), and nodetello uses the completely different so-called rc a b c d command instead (referred to as stick in the code). Technical details can be found in the SDK documentation. Okay, but what is the RC command? I assumed it to be the same as used by the official DJI Tello mobile app, where you have fine control over the drone using two sticks on the screen:

Interface of the official Tello app. Can you see those “analogue” sticks? Source: http://tellohq.com/official-dji-ryze-tello-app/

So I copied both the server and client code of nodetello into my project and threw out the functions I didn’t need in my short-living project. In the end, I had a setup where I could connect to the drone over WiFi; I was able to control it like in a video game and also had a video stream in my browser. However, I didn’t investigate the camera part at this point — that is a couple of steps forward.

With this, we have Step#1 DONE.

Disclaimer: I recommend Wes Bos’ video tutorial for educational purposes. He has done a very good job on demonstrating how any JS developer can make that drone fly.

Fire up face recognition on webcam

Based on our AI training, Python would be the most convenient ecosystem for rapid prototyping AI problems. My Google searches proved the same: lots of open source libraries, supporting community and tons of answered questions around. Python is also better performing than JS based solutions. But I stuck to my decision to use JS as I’m not experienced enough in Python to solve this puzzle under such time pressure.

I found face-api.js to be the best solution for me. The library is built upon Google’s tensorflow.js, therefore, runs offline (which turned out to be a must because Tello sits on the WiFi during flights -> there’s no internet connection on my Mac while the drone flies). Face-api.js provides both node.js and browser-based solutions, and I decided to have it done in the browser for my experimental project, considering that:

I can easily get the webcam working in the browser using <video> tag
nodetello provides videostream on the client side out of the box
there would be less visual aid during development if I chose server side

Before we continue, let’s have a look at the face-api.js features on face-api.js playground. I decided to use the Tiny Face Detector detector because I was convinced by the performance (high FPS) — detection confidence ratio. As it turned out later, its precision is way more unreliable when using the Tello’s camera than my webcam, so I switched to the more resource-heavy but also more reliable SSD Mobilenet V1 model.

See coding progress here.

Math

Face-api.js provides rich API, so it was really easy to extract the information I wanted: the bounding box of the detected face. It enabled me to draw a rectangle over the camera picture. What did I need it for? Not only that it looks super high-tech, it also acts as preventive logging. It helped me to debug and understand my own code further down the road.

Once I obtained the bounding box for my face in the camera frame, I had all the information I needed to calculate the distance of my face from the center of the frame. By finding the center of the bounding box, it was easy to determine the correction needed to move the face to the center of the video. This allowed me to use that information to make the drone follow faces. Calculating the distance on the X and Y axes is straightforward (this is the distance between the center of the bounding box and the center of the camera frame), but there is no information on the Z axis (the distance from the camera to the face) provided by face-api.js. I used the size of the bounding box to roughly estimate the distance between the camera and the face. I won’t go into the specifics of the calculations here, as Jabrils has already covered them in detail.

Now I had the numbers that told me how far the face was!

Use the drone’s camera instead of the webcam

As I mentioned earlier, I already had a camera stream provided by nodetello. All I had to do was replace face-api’s webcamera video input with the canvas that holds the drone camera frames.

const droneCamCanvasEl = document.querySelector("#videoFeed canvas:not(#draw)");
const detection = await faceapi.detectSingleFace(useWebcam ? videoEl : droneCamCanvasEl);

Github link here.

Putting everything together

Let’s summarise what I achieved already:

the ability to control the drone using a keyboard
face recognition on the drone’s camera frame
numbers indicating the distance of the face from the center of the camera on each axis

The final step is to use this information to tell the drone which direction to move in order to keep the face centered. This is the fun part!

Before I tell you how I managed to make it kind of work, let’s learn flight terminology! We need to understand what throttle, pitch, roll and yaw are.

Throttle: controls the elevation. All four rotors are running at the same speed.
Pitch: forward/backward movement similar to a car. When pitching, two of the rotors in the back or front are running at a higher speed to move the drone forward or backward.
Roll: moving to the sides. When the left rotors are running at a higher speed, the drone moves to the right side, and vice-versa in the opposite direction.
Yaw: rotating around the vertical middle axis. When the left-spinning rotors run at a higher speed, the drone turns to the left. The same goes for the other side.

Let’s start with the easiest parameter: throttle. When the face is above the center of the image, I need to increase the throttle. When it is below, I need to decrease it.

A primitive first implementation looks like this:

function controlThrottle(relativeDistanceVector) {
  const direction =
    relativeDistanceVector[1] > 0 ? -1 : relativeDistanceVector[1] < 0 ? 1 : 0;
  return direction;
}
const stickData = { pitch: 0, roll: 0, throttle: 0, yaw: 0 };
stickData.throttle = controlThrottle(relativeDistanceVector);
sendCmd("stick", stickData);

The above snippet declares a stickData object that feeds the cmd command of the Tello SDK. Default values are 0, so the drone won’t move if stickData stays intact. Knowing the distance vector of the detected face, I can calculate the desired movement direction on the vertical axis. While this code is theoretically correct, it will not work well in practice. The main issue is that the drone will never reach the perfect position, so the calculated correction needed will always be a non-zero value. This means that the drone will constantly go up and down, trying to reach the zero position, but it will never succeed. To solve this, a “safe zone” can be defined where the target is close enough to the center. I defined the zone by thresholdMin and thresholdMax values. I also added a weight that I can use to fine tune the movement. It’s called throttleWeight. After all, I added the already existing speed parameter to the calculation, so there’s another control parameter that acts globally. The updated controlThrottle method looks like the following at this stage:

function controlThrottle(relativeDistanceVector) {
  const distanceYThresholdMin = -0.2;
  const distanceYThresholdMax = 0.1;
  const throttleWeight = 1;
  if (
    relativeDistanceVector[1] >= distanceYThresholdMin &&
    relativeDistanceVector[1] <= distanceYThresholdMax
  ) {
    return 0;
  } else {
    const direction =
      relativeDistanceVector[1] > 0
        ? -1
        : relativeDistanceVector[1] < 0
        ? 1
        : 0;
    return throttleWeight * speed * direction;
  }
}

The next parameters are the yaw and the roll. If I only controlled the yaw, the drone would only be able to follow me when I moved around it. However, if I took a few steps to the side, the drone would lose sight of me quickly because it wouldn’t be able to see my face from the front. To prevent this issue, I had to operate both the yaw and roll together. It took me some time to find the right balance and not overdo or underdo any of the movements. Looks like this:

function controlRollAndYaw(relativeDistanceVector) {
  const distanceXThresholdMin = -0.15;
  const distanceXThresholdMax = 0.15;
  const yawWeight = 0.7;
  const rollWeight = 0.3;

  let yaw = 0, roll = 0;

  if (
    relativeDistanceVector[0] < distanceXThresholdMin ||
    relativeDistanceVector[0] > distanceXThresholdMax
  ) {
    yaw =
      yawWeight *
      speed *
      (relativeDistanceVector[0] < distanceXThresholdMin
        ? -1
        : relativeDistanceVector[0] > distanceXThresholdMax
        ? 1
        : 0);
    roll =
      rollWeight *
      speed *
      (relativeDistanceVector[0] < distanceXThresholdMin
        ? -1
        : relativeDistanceVector[0] > distanceXThresholdMax
        ? 1
        : 0);
  }

  return { yaw, roll };
}

The last parameter is pitching. As I mentioned before, I used the size of the face bounding box to determine the distance. After a few dry runs (not turning the propellers on but holding the drone in my hand), I found the magic number for this parameter as well, so the drone didn’t accidentally cut my face.

function controlPitch(area) {
  const areaThresholdMin = 0.045;
  const areaThresholdMax = 0.055;
  const pitchWeight = 0.5;
  if (areaThresholdMin <= area && area <= areaThresholdMax) {
    return 0;
  } else {
    return (
      pitchWeight *
      speed *
      (area > areaThresholdMax ? -1 : area < areaThresholdMin ? 1 : 0)
    );
  }
}

All the control-related changes are in one place here.

At this point, the timer went off. Let’s see the code in action:

The controlling algorithm is not perfect at all and has several limitations. For example, if the camera sees multiple faces, it will randomly select one of them rather than sticking with the same one. Additionally, the person being followed needs to be careful about their movements to stay within the range of the camera. Even if the person stands still without moving their head, the drone will keep swinging back and forth as it tries to find the perfect position, which is impossible to achieve. The latter issue could be solved by using a correctly parameterized compensator algorithm like PID, but I ran out of time and was already satisfied with the result.

By researching and utilizing existing code and libraries, I was able to control the drone and implement face recognition within a 4-hour time frame.

If you haven’t checked the code yet, here’s my Github repo for the last time: link.

Conclusion

In summary, this project demonstrates that anyone can utilize AI and IoT technologies without necessarily having a background in the underlying mathematics. However, I recommend a book written by a friend for those who are interested in delving deeper into the math behind AI and machine learning.

It is important to note that this was an experimental project and should not be considered a model for how to approach a real-life project. The project serves as a good example of how rapid prototyping with AI can be achieved using available resources.

Leave a comment, give a clap and start following me if you liked it!