The birth of Picterra’s “Smart Focus” tool

Picterra recently released its smart focus tool which allows to annotate satellite/aerial/drone imagery much faster. In this post, we’ll dive into the details to see how it was built.

This is the story of how a good UI can unleash the full potential of a Machine Learning model.

What is Smart Focus ?

Imagine you have a drone image from a city and you want to annotate (draw polygons around) individual buildings. You can do this in a GIS software, but the process will take you a long time because you’ll have to zoom out, find a building, zoom in, draw the polygon around the building, pan to the next building and so on.

What if you had a tool to accomplish all of these tasks for you ? This is the idea behind our smart focus tool. After you’ve drawn polygons around the first few buildings, the view will automatically move to other buildings in the image and let you annotate them. This enable users to annotate their image faster and more precisely. This is not limited to buildings, it works for any object of interest !

The video below shows an example of our smart focus tool.

Picterra’s Smart Focus tool demonstration in our platform tutorial video

At the beginning there were two ideas…

At Picterra, our vision of machine learning revolves around two core ideas :

  • Build Deep Learning models tailored to the user’s needs
  • Put the user ‘in the training loop’ by exploiting interactivity

Deep Learning models typically require a large number of labels to train. To work around that, we are exploring ways to allow our users to build powerful models with few annotations.

… then came a grid…

We wanted to experiment with interactive in-browser machine learning on an image. Our first prototype consisted of a grid over the image that is extracted once and then made available to the browser. Each cell in the grid contains a ‘feature vector’ that characterizes the content of the cell. Those features are extracted using a Deep Learning model trained for satellite/aerial/drone image classification. The idea behind this is that those features learnt on a fixed set of class will still be helpful to discriminate the users’ classes of interest.

You can click on the image and this will pick the underlying cell and assign it to a class. In the image below, this corresponds to the upper part where you see small yellow/red squares that were labelled by hand.

Once you have clicked on few squares, you can classify the whole grid and you get what is shown in the lower part of the image. This classification is run in the browser, so you can quickly reclassify as you add more annotations to improve the results.

… but the grid wasn’t very pretty ...

Although its classification accuracy is not really impressive, it is rewarding to play with this prototype and since it is fun and quick to update the classification, you end up annotating a lot more than you would if you had to wait 20 minutes for the model to update. That was a key take away : if we make the interaction rewarding for the users, they will annotate more. This is not a new idea and gamification has been used for a long time to make tedious tasks more fun, but it was still an important lesson.

By making the annotation process rewarding for the user, they will annotate more, further rewarding the user with a better model.

Another lesson from this prototype is that the grid is somehow ugly. The grid structure was necessary because we compute features per grid cell and we cannot make the grid much finer without running into browser performance issues (in this example, the image is a 3600x3600 .tif of about 30MB). But since the grid doesn’t exactly align with what the user is interested in (e.g. houses), the classification results aren’t reaching what one would expect.

… so it disappeared in the shadows …

So the grid is still useful to guarantee good performance in the browser, but what if we didn’t display it? And since one of the problems of the grid is that it is not aligned with the object of interest, what if we instead just use the grid to intelligently suggest area of interests that the user can annotate.

… and gave birth to the “smart focus” tool !

And this is how the “smart focus” tool was born. Instead of displaying the classification as an end result, we use it to move the view to an area that contains the user’s class of interest and let the user annotate manually, thus guaranteeing less time spent annotating. This also lead to a higher annotation quality because the view is zoomed at the right level, pushing users to be more precise in their annotations.

You can see below a screenshot of the “smart focus” tool zooming into an area that contain houses. This is based on very few (<10) user annotations of houses and still, the tool is able to find areas containing other houses quite consistently. Another side effect of not showing the grid is that even if the grid cells don’t align perfectly on the houses, this is not visible to the user, so the subjective feeling you get when looking at the proposed area is “Oh yeah, this model is right, there are houses here”.

This is how our Smart Focus tool currently looks like

The future

This “smart focus” tool is the first in a long series of tools we will develop to let users extract analytics from remote sensing imagery. Our end goal is to train fully custom models using a low number of user annotations, but getting there is an incremental process.

In addition, we recently released pre-trained models for common classes of interest like building and vehicles. Stay tuned for more and if you want to test our platform, register now !