Building an android application to control Tello drone flight and perform real-time object detection using YOLOv5

14 min readApr 8, 2022

Hello there, drone, and machine learning fans! Ever wanted to know how to develop an Android app that can operate a drone while also detecting objects in real-time? Congratulations if you were able to tie and succeed!, if not, follow along and we’ll make an android app to operate the Tello drone, access its built-in camera, and execute real-time (or near real-time) object detection with pre-trained YOLOv5 framework weights.

Ryze tech Tello is a fun little drone developed for fun and educational purposes, for kids and adults alike. The drone offers 13 minutes flight time, 100 meters range, 720p video quality, vision positioning system. While there is an official app available for those who are looking for flying, Tello provides a ton of opportunities to those who are interested in tinkering with and learning programming in a fun and engaging way. Scratch is an MIT-developed programming environment to learning programming for kids. For advanced users, SDK offers numerous possibilities to unlock new possibilities through software development.

This drone comes in three different versions:

1) Tello: it is the most basic model, which allows for programming with SDK version 1.3

2) Tello EDU: Tello version build for educational purposes that extends the SDK to more functionalities.

3) Tello Talent: Tello version with the ability to integrate external sensors and microcontroller programming

FYI, we will be using the basic Tello drone with SDK 1.3 for this project. Let’s Jump right in!. The SDK documentation is available here.

Android application development

Although there are various editors and choices for developing an Android application, we will use Android Studio in the Java programming environment for this project. Let’s start by creating a new project in Android Studio (preferably the most recent version). Let’s choose an ‘Empty Activity’ under activities to get the full experience of developing the application from scratch. We’ll call this application ‘demoApplication’. Because I’ll be using API 31 with Android version 12 on a Google Pixel 4a phone, I picked API 27: Android 8.1 (Oreo) as my minimum SDK (but feel free to explore lower APIs). After you’ve completed this, click Finish.

Let’s start by making the app’s main page, which contains the user interfaces, before moving on to the drone control activities. There should be two files on your window, activity_main.xml and MainActivity.java. We’ll be utilizing Pytorch, and virtual joystick https://github.com/controlwear/virtual-joystick-android for this project, so let’s add those to the build.gradle file.

implementation 'org.pytorch:pytorch_android_lite:1.10.0'

implementation 'org.pytorch:pytorch_android_torchvision_lite:1.10.0'

implementation 'io.github.controlwear:virtualjoystick:1.10.1'

To build.gradle (Project:demoApplication) file add the following lines of code after commenting out plugins (working the old way):

buildscript {
    repositories {
         google()
         mavenCentral()
         jcenter()
       }
dependencies {
        classpath 'com.android.tools.build:gradle:7.1.1'
  }
}

Since the Joystick project is hosted at maven, jcenter() needs to be added. If you face errors during synchronization in the next step, add jcenter() to settings.gradle as well. Then click on the Sycn Project with Gradle files to synchronize the packages with your project.

Virtual-joystick-android (https://github.com/controlwear/virtual-joystick-android) is a fantastic project that makes it simple to integrate a virtual joystick into Android apps. Visit the project homepage for additional information about the project and how to incorporate it into your application. While you’re there, don’t forget to give the repo a star to show your support.

To add a nice design to the main page, lets download some icons from Google Fonts. I am using the ‘rocket’ icon for main activity page, which I think looks pretty cool. Once you click on your desired icon, select ‘Android’ under the side menu bar, download and extract the contents of the folder. Within the extracted folder, inside ‘res\ drawable’ folder select both the xml files and copy it into ‘res/drawable’ directory in your application project folder. Working with xml icon files gives you additional design options, such as the ability to modify the color or icon. Repeat this process and download ‘Flight Takeoff’, ‘Flight Land’, ‘Play Circle’, ‘Videocam’, ‘Camera’ icons (we will be using these icons later in the application UI). Let’s edit the ‘activity_main.xml’ file and replace the whole </TextView> section with the following code:

<TextView
    android:id="@+id/welcomeMessage"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:text="Welcome to demoApplication"
    android:textColor="@color/black"
    android:textSize="20dp"
    app:layout_constraintBottom_toTopOf="@+id/introImage"
    app:layout_constraintLeft_toLeftOf="parent"
    app:layout_constraintRight_toRightOf="parent"
    app:layout_constraintTop_toTopOf="parent" />

and add a Floating Action Button (FAB) using the following code.

<com.google.android.material.floatingactionbutton.FloatingActionButton
    android:id="@+id/introImage"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:background="@drawable/outline_rocket_24"
    android:elevation="10dp"
    android:gravity="center"
    app:maxImageSize="76dp"    
    app:fabCustomSize="150dp"
    app:backgroundTint="#ffcc00"
    app:srcCompat="@drawable/outline_rocket_24"
    app:layout_constraintBottom_toBottomOf="parent"
    app:layout_constraintLeft_toLeftOf="parent"
    app:layout_constraintRight_toRightOf="parent"
    app:layout_constraintTop_toTopOf="parent" />

Setting the app main page background color to white makes the app look good, so let’s set the background color to “@color/white” for the constraint layout. To remove the TitleBar from application, let add these lines to MainActivity.java

requestWindowFeature(Window.FEATURE_NO_TITLE);
getSupportActionBar().hide();

after setContentView(), add the followig:

if (android.os.Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
    getWindow().setNavigationBarColor(Color.parseColor("#000000"));
    getWindow().clearFlags(WindowManager.LayoutParams.FLAG_TRANSLUCENT_STATUS);
    getWindow().addFlags(WindowManager.LayoutParams.FLAG_DRAWS_SYSTEM_BAR_BACKGROUNDS);
    getWindow().setStatusBarColor(Color.parseColor("#000000"));
}

Now let’s create a new activity called ‘droneController.java’, and add the following code to the MainActivity.java file to connect our main activity page to drone controller.

droneControlScreen = findViewById(R.id.introImage);
droneControlScreen.setOnClickListener(v -> {
    Intent droneControlScreenIntent = new Intent(MainActivity.this, droneController.class);
    startActivity(droneControlScreenIntent);
});

By now your main activity layout should look something like this:

Now that we established a connection between the two activities, let’s start building the drone controller actvity UI. First of all, it is convenient to add joysticks if the drone controller is in landscape orientation, so set it to orientation.

In the AndroidManifest.xml file, we also need to add ‘android:screenOrientation=”landscape”’ for the drone controller activity. Let’s construct an xml file that will generate a drop shadow effect for all of the buttons and FABs we’ll be adding to this and other layouts. Under project, go to res/drawable and right-click to get new>file. I am naming the file as ‘round_back.xml’, and adding the following content:

<?xml version="1.0" encoding="utf-8"?>
<layer-list xmlns:android="http://schemas.android.com/apk/res/android">
    <item>
        <shape android:shape="oval">
            <solid android:color="#444444" />
            <padding
                android:bottom="3dp"
                android:left="3dp"
                android:right="3dp"
                android:top="3dp" />
          <corners android:radius="8dp" />
        </shape>
    </item>

</layer-list>

Similarly, to give a nice background color with rounder corners to the TextViews, let add the following xml file (I am naming it ‘rounded_corner.xml’) to the same directory.

<?xml version="1.0" encoding="utf-8"?>
<shape xmlns:android="http://schemas.android.com/apk/res/android" >
    <stroke
        android:width="1dp"
        android:color="#ffcc00" />
    <solid android:color="#ffcc00" />
    <padding
        android:left="1dp"
        android:right="1dp"
        android:bottom="1dp"
        android:top="1dp" />
    <corners android:radius="5dp" />
</shape>

Add two more xml files to indicate drone connection status (red for not connected and green for connected) by changing the android:color=”@android:color/holo_green_light” and android:color=”@android:color/holo_red_light” to the above code. You could achieve the same with Java, but you can look more into it.

Now let’s add the joysticks to control the drone.

<io.github.controlwear.virtual.joystick.android.JoystickView
    android:id="@+id/joystickViewLeft"
    android:layout_width="170dp"
    android:layout_height="170dp"
    android:layout_marginBottom="26dp"
    android:layout_marginStart="16dp"
    app:JV_backgroundColor="#20000000"
    app:JV_borderWidth="4dp"
    app:JV_buttonColor="#ffcc00"
    app:JV_buttonSizeRatio="35%"
    app:JV_fixedCenter="false"
    app:layout_constraintBottom_toBottomOf="parent"
    app:layout_constraintEnd_toStartOf="@+id/joystickViewRight"
    app:layout_constraintHorizontal_bias="0.032"
    app:layout_constraintStart_toStartOf="parent" />

<io.github.controlwear.virtual.joystick.android.JoystickView
    android:id="@+id/joystickViewRight"
    android:layout_width="170dp"
    android:layout_height="170dp"
    android:layout_marginEnd="26dp"
    android:layout_marginBottom="26dp"
    app:JV_backgroundColor="#20000000"
    app:JV_borderWidth="4dp"
    app:JV_buttonColor="#ffcc00"
    app:JV_buttonSizeRatio="35%"
    app:JV_fixedCenter="false"
    app:layout_constraintBottom_toBottomOf="parent"
    app:layout_constraintEnd_toEndOf="parent" />

The UI contains couple of buttons for each function, most of which are self-explanatory (takeoff, land, capture images, video, start live session). In addition, the top right side of the screen is saturated with text (with background color to appear like buttons) that displays the drone status (such as battery, temperature, speed etc.). Due to large chunk of code, I am not going to post all of the xml content here, but if you want to see all the components for drone virtual controller, you can find it here https://github.com/jithin8mathew/Tello_object_detection_demo_application/blob/master/app/src/main/res/layout/activity_drone_controller.xml. At this point, if you did everything right, your drone virtual controller should look something like this:

Now that we have completed the GUI part for our application, it’s time to dive into the java code to make it all work!

But first, as we’ll be dealing with Wi-Fi connections, let’s add the permission to access the Wi-Fi (UDP). Add the following line before <application> in your AndroidManifes.xml file:

<uses-permission android:name="android.permission.INTERNET" />

This Tello drone application can be broken down into two components (at least just for what we’re doing here):

1) Tello handling for flight

2) Detecting objects using the live video feed from the Tello camera.

Tello SDK 1.3 provides three distinct ports under its IP address (192.168.10.1) to do this: port 8889 for sending and receiving messages from Tello, port 8890 for receiving Tello status (such as battery, temperature, and so on), and port 11111 for video streaming. By developing an android application, we are initiating the SDK to control the drone by literally sending the command ‘command’, to which the drone responds ‘ok’ or ‘error’. For more commands refer to the SDK 1.3.

Let’s have a look at droneController.java. To begin, we’ll create a set of variables to handle a variety of events, ranging from drone connection status to live video feed. After initializing all of the FloatingActionButton (FAB), Switch, joysticks, ImageVeiws, and TextViews, we’ll use the following code snippet to start the drone SDK mode with a button (FAB) click:

connection = findViewById(R.id.connectToDrone); // a button to initiate establishing SDK mode with the drone by sending 'command' command
connection.setOnClickListener(new View.OnClickListener(){
    public void onClick(View v){
        if (connectionClickCounter % 2 == 1){   // to enable swith like behavior to connect and disconnect from the drone
            telloConnect("command");
            Toast.makeText(droneController.this,"Drone connected",Toast.LENGTH_SHORT).show();
            connectionFlag = true;              // set the connection status to true
        }
        if (connectionClickCounter % 2 == 0){
            telloConnect("disconnect");
            connectionFlag = false;
            Toast.makeText(droneController.this,"Drone disconnected",Toast.LENGTH_SHORT).show();
        }
        connectionClickCounter++;
    }
});

Notice how variables like ‘connectionFlag’ are used to actively monitor and maintain the drone connection status, which is used by other methods in this activity. Similarly, we can use the following code to take off and land:

actionTakeOff = findViewById(R.id.takeoff);
actionTakeOff.setOnClickListener(v -> {
    if (connectionFlag){
        telloConnect("takeoff"); // send takeoff command
    }
});

actionLnad = findViewById(R.id.land);
actionLnad.setOnClickListener(v -> {
    if (connectionFlag){
        telloConnect("land");   // send land command
    }
});

The virtual joystick outputs tow parameters ‘angle’ and ‘strength’. The angle refers to the angle at which the user moves the stick, while the strength refers to the extent to which the stick is moved. Here we divided a 360° circle into 4 sections for left, top, right and bottom or up, down, roll left, roll right. When the joystick outputs these parameters, an array is populated with the values and sent to the drone using send command (e.g., ‘rc 10 0 45 0’). The array values are reset to 0 after each shift.

JoystickView leftjoystick = (JoystickView) findViewById(R.id.joystickViewLeft); leftjoystick.setOnMoveListener((angle, strength) -> {

    if (angle >45 && angle <=135){
        RC[2]= strength;
    }
    if (angle >226 && angle <=315){
        strength *= -1;
        RC[2]= strength;
    }
    if (angle >135 && angle <=225){
        strength *= -1;
        RC[3]= strength;
    }
    if (angle >316 && angle <=359 || angle >0 && angle <=45){
        RC[3]= strength;
    }

    telloConnect("rc "+ RC[0] +" "+ RC[1] +" "+ RC[2] +" "+ RC[3]); // send the command eg,. 'rc 10 00 32 00'
    Arrays.fill(RC, 0); // reset the array with 0 after every virtual joystick move
});

The Tello state and video stream must be handled in a thread that operates independently of the UI thread of android application to avoid interruption. This prevents the app from crashing or lagging by waiting for the response from the function. But this prevents the program from displaying the response from withing the thread as well. To display the output from Tello state (such as battery, temperature, speed etc.), the output has to be passed from the runnable thread to the UI thread through a handler. We do this by declaring and initializing a handler:

private Handler telloStateHandler;
telloStateHandler = new Handler();

You’ll see that I used the telloConnect() method quite a bit. TelloConnect is a thread that manages communication between Tello and the Android app (drone controller activity). The method starts with a thread that runs indefinitely after it is initialized. A datagram socket (UDP socket) is initialized to which a binding address (tello IP address) is passed as an attribute. The function is written in a way as to both send individual Tello commands like ‘land’, ‘takeoff’, or virtual joystick commands and at the same time listen to constant Tello state for battery, temperature, barometric pressure, speed etc. which is received as a string. For this multiple threading is used with infinite loops running with a fixed interval delay. The string is analyzed for individual values using a regex pattern match when the drone communicates its status. As each packet response from the drone arrives, a Java handler is utilized to show the parsed information in the UI. The code for the above-mentioned part may be found at https://github.com/jithin8mathew/Tello_object_detection_demo_application.

H.264 video decoding

The crucial component of this project is now underway: decoding the incoming video feed from Tello. I’ve been working on this project for a while now, scratching my head over how to decode and show the output on an Android screen. There aren’t many Android projects that accomplish this with the built-in MediaCodec library (android.media.MediaCodec), which is another motive for writing this article. H.264 is commonly known as AVC (Advance Video Coding), a standard a video compression that can provide a good video quality while maintaining a lower bit rate.

When displaying the video output after decoding the incoming stream, android provides great number of options such as ‘SurfaceView’ and ‘SurfaceTexture’ (Captures frames from an image stream as an OpenGL ES texture), which are memory efficient and easy to implement. SurfaceView can be created and destroyed if the user chose to display the video or not, which is achieved through SurfaceHolder.Callback#surfaceCreated and SurfaceHolder.Callback#surfaceDestroyed. SurfaceView is the ideal solution for users who only want to display the video on the screen and don’t want to conduct any additional post-processing with it. As previously stated, to increase the speed of broadcasting, this view does not save the actual frames in memory. As a result, the biggest disadvantage is that you cannot use the video frames for activities such as object detection or post-processing.

Next, let’s create couple of functions to handle live video frames. First, let create a function to convert Image format to Bitmap format and another function to convert and display the bitmaps on the screen.

public class displayBitmap implements Runnable{

    protected BlockingQueue displayQueue;       
    protected Bitmap displayBitmap_;             

    public displayBitmap(BlockingQueue displayQueue_){
        this.displayQueue = displayQueue_;
    }

    @Override
    public void run(){

        while (true){
            try {
                displayBitmap_ = (Bitmap) displayQueue.take();           
                displayQueue.clear();   
                if (displayQueue != null){
                    runOnUiThread(() -> {                                                                          bitImageView.setImageBitmap(displayBitmap_);     
                        bitImageView.invalidate();
                    });
                }
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

Let’s create a function called videoServer() as a new runnable thread and pass a bitmap (from blocking queue) and ‘streamon’ command. videoServer begins by defining SPS and PPS hardcoded (as a constant) into the function. The better way to do this would be to obtain it dynamically from tello itself. Following that, we supply the MediaFormat with decoding parameters such as video format, SPS, PPS, dimension, color format, frame rate, and so on which are crucial in decoding the stream. After this, we initialize the MediaCodec into a variable and pass the MediaFormat as parameters where the surface (parameter) is passed as null because we are not going to implement SurfaceView. Tello sends each frames by breaking it down into multiple NAL (Network Abstraction Layer) units broadcast it as byte arrays (since tello uses IEEE 802.11 frames, a frame must be fragmented and sent piece by piece). To process a single frame, we need to capture the broken-down NAL units until one complete frame is received, at which time we will transfer the byte array to the media codec to decode it into an image.

A single H.264 frame is made up of three primary components:

1. SPS (sequence parameter set)

2. PPS (picture parameter set)

3. Keyframe: a layer of non-instantaneous decoding refresh (non-IDR)

Each segment of data sent by the drone (video) is a byte array of 1460 bytes (usually more than 3 units) followed by a byte array that is less than 1460 bytes (which indicates the end of sequence / one complete frame). Like telloConnect function, we will use datagram socket to bind to socket port 11111, receive packets, store the packet data (byte array) by appending to another byte array until a full NLA unit is received. Once a full NAL unit is received (indicated by packet length <1460), the data is copied to a byte buffer and delivered to MediaCodec using:

m_codec.queueInputBuffer(inputIndex, 0, data.length, presentationTimeUs, 0);

MediaCodec will process the frames and output the image (basically decode or put together the fragments to yield a H.264 frame). We can obtain the output and store it in Image format using:

m_codec.getOutputImage(outputIndex);

which can be converted to Bitmap format.

Finally the bitmap is transferred to a blockingQueue for other functions (like displayBitmap()) to pick up and display or perform object detection.

Object detection using YOLOv5

YOLOv5 (You Only Look Once) (don’t forget to star the repo), created and maintained by Glenn Jocher from Ultralytics is a set of object detection architectures (ranging from nano to XL sizes). Due to their ease of deployment, usability, and, most importantly, their speed and accuracy, we will be using pretrained architectures of YOLOv5 for this project. If you are interested in training your own model for deployment, I recommend you to follow this link (I would recommend training the models on Google Colab to avoid errors during deployment especially on mobile devices).

Pytorch has an android-demo-app for object detection hosted at GitHub repo. Once again, don’t forget to star the repo. To make things simple, we will follow this code and instructions to setup object detection on our android project. To perform object detection, we will borrow some codes from Pytorch’s object detection demo on android application.

To store and plot the results from objects detected on each frame we need to create a java class such as ResultView.java (let’s call it DetectionResult.java), similarly we will also use this code for processing our results.

To load the YOLOv5 small architecture into memory, we will use Pytorch’s Module:

org.pytorch.Module;

and finally, let declare a threshold for Non-maximum Suppression (NMS) for post-processing on detected objects (results). I decided to go with the default value of 30%, feel free to change it to meet your requirements.

private float rtThreshold = 0.30f;

The object detection model (YOLOv5 small) and corresponding classes must be saved in the android app environment before they can be loaded. To do this, we’ll establish an assets folder in the main application directory, as seen in the figure below:

Creating an Assets Folder in Andriod Studio

Download a pretrained YOLOv5 small model from here and classes.txt from here and copy it to the assets folder (Right click on app>Open in>Explorer(Windows)/Finder (Mac)).

To display the detected bounding boxes, lets add a custom view to the activity_drone_controller.xml file which is based on DetectionResult.java program.

<com.example.demoapplication.DetectionResult
    android:id="@+id/DetectionResultView"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:elevation="12dp"
    app:layout_constraintEnd_toEndOf="parent"
    app:layout_constraintHorizontal_bias="1.0"
    app:layout_constraintStart_toStartOf="parent"
    app:layout_constraintTop_toTopOf="parent"
    />

Let’s create an object detection thread that will perform, as the name implies, object detection.

public class objectDetectionThread implements Runnable{

    private Bitmap threadBM;                            
    private volatile ArrayList results;                 
    protected BlockingQueue threadFrame = null;         

    public  objectDetectionThread(BlockingQueue consumerQueue){
        this.threadFrame = consumerQueue;               
    }

    @WorkerThread
    @Nullable
    public void run(){
        while (true){
            try {
                threadBM = (Bitmap) threadFrame.take();
                threadFrame.clear();                    
                analyseImage(threadBM);                 
                sleep(250);                         
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
    public ArrayList getValue(){
        return results;
    }
}

The object detection part of the code works by loading the pretrained weights to the memory. The bitmap obtained from the bocking queue is converted to tensor and passed on to the network to perform classification, localization, and detection. After the network architecture’s output is received, post processing is used to filter out results (detection) with confidence levels less than a threshold value known as the Intersection over Union (IoU) threshold. The array of bounding box coordinates generated by post-processing is provided to DetectionResult.java, which plots the coordinates on a custom view on the android screen.

By now your app should be up and running. Just like this:

demoApplication detecting object during flight

This covers the most of the subjects I wanted to address in this post. A complete version of this application is available at https://github.com/jithin8mathew/Tello_object_detection_demo_application , which you can fork or clone and install directly on you phones, or continue to develop. If you run across any problems during development or installation, please raise an issue on this project’s GitHub page. Finally, if you liked this project and find it interesting, please star the project so that it can reach more people. Thank you, and I hope you have a good time with it!

Building an android application to control Tello drone flight and perform real-time object detection using YOLOv5

Written by JITHIN MATHEW

No responses yet