How to easily do Object Detection on Drone Imagery using Deep learning

This article is a comprehensive overview of using deep learning based object detection methods for aerial imagery via drones. We also present an actual use of drones to monitor construction progress of a housing project in Africa.

#update1: We just launched Nanonets Drone APIs!

Did you know Drones and it’s associated functions are set to be a $50 billion industry by 2023? As of today drones are being used in domains such as agriculture, construction, public safety and security to name a few while also rapidly being adopted by others. With deep-learning based computer vision now powering these drones, industry experts are predicting unprecedented use in previously unimaginable applications.

We explore some of these applications along with challenges in automation of drone-based monitoring through deep learning.

Finally, a case-study is presented for automating remote inspection of construction projects in Africa using Nanonets machine learning framework.


Section 1: Aerial Imagery — a brief background

Man has always been fascinated with a view of the world from the top — building watch-towers, high fortwalls, capturing the highest mountain peak. To capture a glimpse and share it with the world, people went to great lengths to defy gravity, enlisting the help of ladders, tall buildings, kites, balloons, planes, and rockets.

Images of San Francisco taken from a Kite in 1906 via Library of Congress

Today, access to drones that can fly as high as 2kms is possible even for the general public. These drones have high resolution cameras attached to them that are capable of acquiring quality images which can be used for various kinds of analysis.

Aerial image of an agricultural field

Section 2 : Drones & their industrial applications

With easier access to drones, we’re seeing a lot of interest and activity by photographers & hobbyists, who are using it to make creative projects such as capturing inequality in South Africa or breathtaking views of New York which might make Woody Allen proud.

While all of this seems pretty neat, the meat of the supposed $50 billion dollar drone industry lies in it’s industrial applications.

We explore some here:

Energy : Inspection of solar farms

Routine inspection and maintenance is a herculean task for solar farms. The traditional manual inspection method can only support the inspection frequency of once in three months. Because of the hostile environment, solar panels may have defects; broken solar panel units reduce the power output efficiency.

Left: Original thermal image of the solar panels. Right: Defect localisation and classification from Intel’s automated system

Agriculture: Early plant disease detection

Researchers at Imperial College London is mounting multi-spectral cameras on drones that will use special filters to capture reflected light from selected regions of the electromagnetic spectrum. Stressed plants typically display a ‘spectral signature’ that distinguishes them from healthy plants.

Spectral image of plant leaves showing pathogen and nutrient stress

Public Safety: Shark detection

Analysis of overhead view of a large mass of land/water can yield a vast amount of information in terms of security and public safety. One such example is spotting sharks in the water off the coast of Australia. Australia-based Westpac Group has developed a deep learning based object detection system to detect sharks in the water.

There are various other applications to aerial images such as Civil Engineering (routine bridge inspections, power line surveillance and traffic surveying), Oil and Gas (on- & offshore inspection of oil and gas platforms, drilling rigs), Public Safety (motor vehicle accidents, nuclear accidents, structural fires, ship collisions, plane and train crashes) & Security (Traffic surveillance, Border surveillance,Coastal surveillance, Controlling hostile demonstrations and rioting).

Section 3 : Acquiring & Processing Industrial-grade drone images

To comprehensively capture terrain & landscapes, the process of acquiring aerial images can be summarised in two steps.

  1. Photogammetry: During a UAV flight, several images need to be taken at regular intervals to ensure that images overlap. This is critical so that measurements between objects present in the images can be made. Broadly, this process is known as photogrammetry. For imagery to be used for data analysis and mapmaking, relevant metadata is required for imagery stitching. These metadata are inserted automatically by a microcomputer onboard a UAV.
  2. Image stitching: Once the data acquisition has been completed, the second step is to amalgamate individual aerial images into a useful map, typically using a specialised form of photogrammetry to quickly stitch images together. This specialised form of photogrammetry is called Structure-from-Motion (SfM). SfM software stitches images of the same scene from different angles, together by comparing, matching and measuring angles between objects within each image. During this step, the images might be geo-referenced in order to attach location information to each image.

After image stitching, the generated map can be used for various kinds of analysis for the applications mentioned above.

Section 4 : Artificial Intelligence meets Drones

High-resolution aerial imagery is increasingly available at the global scale and contains an abundance of information about features of interest that could be correlated with maintenance, land development, disease control, defect localisation, surveillance, etc. Unfortunately, such data are highly unstructured and thus challenging to extract meaningful insights from at scale, even with intensive manual analysis.

For eg, classification of urban land use is typically based on surveys performed by trained professionals. As such, this task is labor-intensive, infrequent, slow, and costly. As a result, such data are mostly available in developed countries and big cities that have the resources and the vision necessary to collect and curate it.

Another motivation for automating the analysis of aerial imagery stems from the urgency of predicting changes in the region of interest. For eg, crowd counting and crowd behaviour is frequently done during large public gatherings such as concerts, football matches, protests, etc. Traditionally, a human is behind the analysis of images being streamed from a CCTV camera directly to the command centre. As you may imagine, there are several problems with this approach such as human latency or error in detecting an event and lack of sufficient views via standard-static CCTV cameras.

Below are some of the commonly occurring challenges when using aerial imagery.

Challenges and constraints to automating the use of aerial imagery

There are several challenges to overcome when automating the analysis of drone imagery. Following lists a few of them with a prospective solution:

  1. Flat and small view of Objects : Current computer vision algorithms and datasets are designed and evaluated on lab setting using human-centric photographs taken horizontally with a close distance to the object. For UAV imagery taken vertically the objects of interest are relatively small and with fewer features mostly appearing flat and rectangular. For eg. image of a building taken from a UAV only shows the roof whereas the terrestrial image of the building will have features such as doors, windows, and walls.
  2. Difficulty in labelling data: Following up with above point, even if we could acquire a large number of images, we still need to label those images. This is a manual task and one that needs precision and accuracy as “garbage in leads to garbage out”. There is no magical solution to labelling rather than doing so by hand. At Nanonets, we provide annotators on demand that can label the data for you.
  3. Large image sizes: Drone images are large in size, exceeding 3000px X 3000px resolution in most cases. This adds to the computational complexity while processing such images. In order to circumvent this, we apply pre-processing methods to aerial imagery in order to make them ready for our model training phase. This involves cropping images at different resolution, angles and pose in order to make our training invariant to these changes.
  4. Object overlap: One of the problems with splitting up images is that the same object might occur in two separate images. This leads to double detection and errors in counting objects. Also, during detection certain objects that are very close to each other might also have overlapping bounding boxes. One of the ways to overcome this problem is to upsample via a sliding window to look for small, densely packed objects.

Section 5 : Nanonets Case Study: Automating remote inspection of construction projects in Africa

Pragmatic Master, a South-African robotics-as-a-service collaborated with Nanonets for automation of remotely monitoring progress of a housing construction project in Africa.

These projects are generally prone to delay and pilferage due to misreporting which can potentially be solved by flying a drone frequently to map and document.

We aim to detect the following infrastructure to capture the construction progress of a house in it’s various stages :

  1. foundation (start)
  2. wallplate (in-progress)
  3. roof (partially complete)
  4. apron (finishing touches)
  5. geyser (ready-to-move in).

Pragmatic Master chose Nanonets as it’s deep learning provider because of it’s easy-to-use web platform and plug&play APIs.

The end-to-end process of using the Nanonets API is as simple as four steps.

End-to-end flow of the Nanonets API

1.Upload images: Images acquired from the drones can be uploaded directly to our upload landing page. For the current case study, we had a total of 1442 images of a construction site taken at low altitudes. Example of uploaded images is given below.

2. Labelling of images: Labelling images is probably the hardest and the most time-consuming step in any supervised machine learning pipeline, but at Nanonets we have this covered for you. We have in-house experts that have multiple years of working with aerial images. They will annotate your images with high precision and accuracy to aid better model training. For the Pragmatic Master use-case, we were labelling the following objects and their total count in all the images.

  1. Roof: 2299
  2. Geyser: 6556
  3. Wallplate: 1043
  4. Apron: 8730
Example labelled image of geysers

3. Model training: At Nanonets we employ the principle of Transfer Learning while training on your images. This involves re-training a pre-trained model that has already been pre-trained with a large number of aerial images. This helps the model identify micro patterns such as edges, lines and contours easily on your images and focus on the more specific macro patterns such as houses, trees, humans, cars, etc. Transfer learning also gives a boost in term of training time as the model does not need to be trained for a large number of iterations to give a good performance.

Our proprietary deep learning software smartly selects the best model along with optimising the hyper-parameters for your use-case. This involves searching through multiple models and through a hyperspace of parameters using advanced search algorithms.

The hardest objects to detect are the smallest ones, due to their low resolution. Our model training strategy is optimised to detect very small objects such as Geysers and Aprons which have an area of a few pixels.

A complete house detected

Following are the mean average precision per class that we get, 
Roof: 95.1%
Geyser: 88%
Wallplate: 92%
Apron: 81%

Note: Adding more images can lead to an increase in the mean average precision. Our API also supports detecting multiple objects in the same image such as Roofs and Aprons in one image.

4. Test & Integrate: Once the model is trained, you can either integrate Nanonet’s API directly into your system or we also provide a docker image with the trained model and inference code that you can use. Docker images can easily scale and provide a fault tolerant inference system.

Final step, Images are stitched back up to create a view of the entire landscape using GIS data associated with every image.

Predicted images stitched together to create a view of the entire landscape

Section 6 : Data Privacy

Customer trust is our top priority. We are committed towards providing you ownership and control over your content at all times. We provide two plans for using our service,

  1. Developer: The images that you upload for your use-case may be used by us to pre-train our models that we can further use for our other applications.
  2. Enterprise: Your data is yours! We will never use your data for pre-training any of our models.

For both the plans, we use highly sophisticated data privacy and security protocols in collaboration with Amazon Web Services, which is our cloud partner. Your dataset is anonymised and goes through minimal human intervention during the pre-processing and training process. All our human labellers have signed a non-disclosure agreement (NDA) to protect your data from going into wrong hands. As we believe in the philosophy of “Your data is yours!”, you can request us to delete your data from our servers at any stage.

About Nanonets

NanoNets is a web service that makes it easy to use Deep Learning. You can build a model with your own data to achieve high accuracy & use our APIs to integrate the same in your application.

For further details, visit: https://nanonets.com/drone


Pragmatic Master is a South African robotics as a service company that provides camera-mounted drones to acquire images of construction, farming and mining sites. These images are analysed to track progress, identify challenges, eliminate inefficiencies and provide an overall aerial view of the site.