Drones at War and Computer Vision

Anton Maltsev
9 min readDec 18, 2023

--

I usually write articles about Computer Vision and Machine Learning for professionals. But sometimes, I do more popular articles or videos. This will be one of them.
In this article, I will overview what Computer Vision tasks are being done in modern military drones. The last two years of the war in Ukraine and the war in Gaza show a lot of progress in this area. In this article, I want to list tasks I saw and show basic ideas around them.

Image generated by Dalle-3

DISCLAIMER. I have over 15 years of Computer Vision experience. However, I was not involved in any project with drones. Moreover, for the last two years, I have been refusing any drone-related consultations, regardless of the location of the request.

Nevertheless, I have a good understanding of what algorithms are out there now. Drone algorithms are identical to those used in crewless cars, robots, phones, security, and medical systems.

What did I take into account when writing this article?

  1. Drone video footage. For example, here (a, b).
  2. Vacancies that are flying in ML communities (first of all, such channels are Russian-speaking)
  3. Public startups (primarily in Europe and the US)
  4. Requests for consultations that spam me
  5. My tests of different hardware for performance + understanding of algorithms.

In the story, I will talk about those machine learning algorithms used for drones and fantasize about the next steps.

Why do you need Computer Vision in drones? It’s used to control drones in situations where:

  1. No signal from GPS
  2. A human can’t control it (no signal/lack of reaction speed, etc.)

Tracking a selected object

These algorithms worked a long time ago and worked very well. As recently as five years ago, there were a lot of drones with the “Follow me” function.

Image generated by Dalle-3

The function itself is straightforward. It could work well even before the era of neural networks. For example, here is an example of a state-of-art tracker from 2011.

General operation logic:

  1. A target point is selected on a frame.
  2. The point is located on each following frame.
  3. The missile/drone is rotated to the point.

This feature is working in a lot of places. I’ve seen at least something similar on the Lancet and SwitchBlade videos. But it seems to be more common.
What are the limitations of such algorithms? About three years ago, I wrote an article about tracking problems. Here are some examples from there (KCF tracker, not the best, but not the most problematic either):

  1. Object slowly disappear
My original article

2. Fast shape change

My original article

3. Jumping to similar objects

My original article

4. Sudden change in brightness
5. The object goes out of the frame

These restrictions set the criteria for the applicability of the algorithm. It cannot be used on drones where the object can quickly move out of frame, which flies at high angular speeds (FPV drones). Very often, at the moment of impact, the drone does not see the target where it hits (at least on public videos). If the object goes outside the frame, it will most likely be lost during this time.

Moreover, for situations where there are no such problems (the speed of the target is much lower than the speed of the drone, the target is visible), the algorithm increases the efficiency of the drone:

  1. Drones have a video control lag, which means the operator does not see the last meters/does not have time to react.
  2. Human decision-making lag is usually ~200ms
  3. There is often poor communication near the ground.
  4. There are often jammers near military equipment.

For the tasks “show the target at a distance of 300 meters, and then track yourself,” tracking is ideal.

Navigation

Image generated by Dalle-3

What is a navigation task? This is when the drone can fly itself to a point. This is necessary because the GPS connection is often not stable. This task is divided into two:

  1. Construction of a local map and flight along it (tens of kilometers)
  2. Global navigation flight on an existing map

Local navigation

The problem of local navigation is addressed in some commercially produced drones. Definitely Lancets and probably Switchblades. For such navigation, neural networks are not needed. For example, in 2017, when the Lancets began to be developed, there was nothing about neural networks in the vacancies. Only C++, OpenCV, and CUDA.

In 2014–2015, I developed 3D scanners using the same Jetson. The task there is very similar. And it was solved without neural networks.

A lot could have changed since then. But the main idea of the algorithms, I think, remains the same: when a drone flies, it scans the space around it. And within this space, it can fly to any of the specified points. Perhaps this space is initialized by some satellite images (for example, here, I came across this mention), but I wouldn’t be sure about that. The problem can be solved by constructing an internal map. Moreover, this is clearly indicated on the official website (it seems to me that the site does not open from all countries; I had to turn on the VPN).

Very often, the impacts of Lancets are shown from other Lancets c. I hypothesize that the space that each Lancet builds is synchronized with other Lancets in the vicinity. And target designation can be mutual.

This navigation makes it possible to fly according to target designations without GPS. And to show designations from other drones.

Global navigation

If you want your drone to be able to fly somewhere far away, then you need the ability to load a flight map. This problem can be solved much worse without neural networks.

  1. Often, the environment looks different from a satellite than from the ground (different lighting, different point of view)
  2. When flying at low altitude, it is often difficult to see anything at all

Apparently, these tasks are now being actively developed on both sides. From what was on the surface, I saw Bavovna, Asio. Plus, I saw 2–3 more startups with a low profile.
But neither Russia nor Ukraine have yet unambiguously solved this problem. Russia uses Iranian drones for long-range strikes, which do not have this functionality. Ukrainian drones were clearly shot down by GPS jammers (there was a lot of video).

Object detection

Image generated by Dalle-3

When someone talks about AI on drones, they first talk about how “the drone can find and destroy enemy targets themselves.” But in reality, I believe this is almost non-existent or works badly. Why?

  1. The recognition accuracy of modern neural networks, especially those running on Edge computers, is worse than humans in terms of accuracy. Neural networks don’t do magic.
  2. Any neural network, just like a human, cannot detect hidden objects.
  3. A neural network won’t be able to tell the difference between a hoax and a real technique. A human has the ability, when flying over from multiple angles, to realize this.
  4. It is not clear how a neural network will be able to distinguish the hit tanks from working ones.

This is overlaid by the fact that there was little data to study at the beginning of the war. I think they are now.

You need a large, complex, and dynamic logic to make it work. Can it be simplified? In my opinion, there are several possibilities:

  1. If it is an attack drone, it can only respond to “moving targets.” This will solve the question of distinguishing them from decoys, destroyed ones, etc.”
  2. If this is a reconnaissance drone, provide the operator with hints if something new appears that the operator missed
  3. It seems to me that, in theory, the detection part could also help with automatic shooting adjustment.

The first — I’ve seen a few times in analytics videos from CIT.
I haven’t heard about the second and third, but it seems to be a matter of the near future.

The detection networks themselves are not a big problem. But we’ll get to that in the hardware section, where the performance could be an issue.

Detection from a different point

Judging by what has been discussed lately — it is now more popular to detect not from drones but the other way around “drones.” This is being done to create automatic anti-drone guns. It’s a much simpler task. If it flies and doesn’t look like a bird, it’s a drone. It’s a lot more mechanical problem than CV.

What other algorithms did I miss?

I haven’t noticed anything obvious anymore. Any ideas in the chat?
But it seems that now, with the development of LLM models, more algorithms will “summarize” primary detections and do analytics and control based on them. Maybe drone swarm management, perhaps a high-level system to analyze what’s happening on the battlefield.
Maybe drones do not make direct “detections” but collect analytics from dozens of frames.

What about Hardware?

Since I am writing about algorithms, I will touch only on the topic of inference part. Edge devices for ML are being developed now. Many chips aresuper efficient for neural networks. These are American and European companies, Chinese companies, and Israeli ones. You can watch many videos with reviews of such boards on my channel and Medium.
I should probably highlight Jetsons, which are found in the Russian Lancet. These are very convenient chips for working with 3D. Not all CPU boards can compare with GPUs in terms of computing power. At the same time, Jetson is also suitable for neural networks.
Jetson is quite a unique board, and because of this, it has become super mainstream. It is used massively for license plate recognition solutions, robotics, and autopilots.

Image generated by Dalle-3

But for drones, where there is no complex 3D modeling and very complex neural networks, you can use the simplest NPU chips / powerful CPU processors. First of all, it will work for detection and tracking tasks.
Doing full navigation on a CPU or NPU is most likely complicated. CPUs are pretty slow, and NPUs are not perfect for complex networks.

The price of simple NPU/CPU boards is usually in the range of several hundred dollars. For simple tasks, you can get by with 50. They are sold en masse on Amazon/Aliexpress.

What tasks today are bad for such boards?

  1. Global navigation. Requires big complex networks and a lot of computations
  2. 3D navigation. Jetsons are good for this. Intel-based platforms are ok.
  3. Detection on hi-res frames. This task is possible but with low FPS.

Conclusions

The field is developing very fast, and the number of startups is growing daily. The demand is obvious: judging by drone videos — nowadays, most drones are flying under human control. As sad as it is, I expect more adoption of control assistance systems in the coming years. The systems will become more deadly.

I think this would be primarily for tracking algorithms and navigation algorithms. In addition, I expect improvements in the area of video capture. These are more professional cameras, the use of thermal cameras, etc.

This article is a “popular science” one. Usually, I write more technical articles. You can find them:

  1. Here, on Medium
  2. LinkedIn
  3. YouTube
  4. Telegram (in Russian)

--

--