Picterra & Machine Teaching

The role of human guidance vs big data in Machine Learning

Roger Fong
Picterra
9 min readMay 13, 2020

--

Machine Learning in Academia & Industry

Machine Learning is all the rage these days and with the massive influx of earth observation imagery from all sorts of sensor types and platforms, the geospatial domain seems like a perfect fit for it. However, as hyped up as machine learning is we shouldn’t forget it is a still a relatively new technology as far as it’s use in real world applications goes.

The efficacy of machine learning has indeed been proven in Academia, but in Academia it’s all about producing results that can provide better performance than pre-existing research and the easiest way to do that is to test new research on the same datasets that have been used by said pre-existing research. This means there is not actually that much proof that these solutions actually do generalise at a global scale, assessment in real world applications needs to be done on a case by case basis. Nevertheless in order to try to cover as many cases as possible these datasets are oftentimes very large and require a tremendous amount of annotation effort.

Examples of some computer vision datasets used in Academia, those are some pretty big numbers

Many machine learning solution providers and platforms try to take a route that mirrors the discoveries of academia: use big datasets to train cutting edge models, try to make them work on as many use cases as possible with no additional work after training, then try to sell them to customers as a one click solution that works everywhere, so that after the training is done there is no more effort needed as far as the model is concerned. This is sexy and all, but it simply does not work that well.

For most geospatial use cases, there is not a single machine learning model that can generalise at a global scale that can just work anywhere, across any type of imagery, any type of sensor, any resolution. As large as the datasets being used are they come nowhere near to being large enough for this kind of approach. For example, let’s take the popular problem of “building detection”. Aside from variations in the resolution and sensor type there is even the problem here of what type of building! Buildings come in all shapes and sizes: American style, old town european style, informal settlements and slums, factories, huts, etc. There are thousands of types of “buildings”, if not more. What’s more is that each of these buildings can also appear in different types of backgrounds, geographies, contexts, and that has to be taken into account as well. A universal “building” detection model will have to be trained on copious quantities of every single one of variations. Who has this kind of dataset? Noone.

If you want a one-click wonder solution, you can use as big a dataset as you can find, but it probably won’t be big enough.

If these global solutions existed the world would already be a very different place right now and a lot of jobs would have already been replaced by these zero-effort solutions (like driving). Big datasets are important, but with the technology being where it is right now, they’re being used the wrong way (unless perhaps you’re facebook or google with infinite resources to throw at the problem). That doesn’t mean machine learning isn’t useful. It is still an extremely powerful tool that opens a lot of doors, but it has to be used in a realistic manner. People often have a very black and white view of machine learning. They think, but it’s AI it should be able to do everything on its own, like terminator.

Machine learning is not Artificial Intelligence. AI is just a marketing term, machine learning is a tool and like most tools it works best when used properly.

In particular, as I said, we are not in technological state where your machine learning model can just work on anything you throw at it, that is, not without some kind of external guidance.

That guidance can be provided by you. But you have to provide it iteratively, because teaching is an iterative process. You don’t throw every single textbook in your library at a student all at once, you do it bit by bit. The neat thing about machine learning and the geospatial domain though is you don’t have to provide as much of this guidance as you might think. That is both a perk of the nature of machine learning as well as the nature of use cases in the geospatial domain.

By combining your guidance with the flexibility and automate-ability of machine learning in a step by step, iterative way, you can do much more, much faster than either you or the machine learning model could have ever done separately.

A Machine learns, A human teaches

With these “one click wonder” solutions, the data used to train the model will never have global coverage and this model can only ever be as good as the data. So what data was used exactly in these solutions? Well if you do not know, then you have no control over it, and that’s a problem. The data that is used to train the model is what defines the problem and the scenario that the model is trying to solve. But if you are not the one providing the data, then you are not the one defining the scenario. Someone else is defining a scenario for you (whether it’s a consulting agency or even someone in academia who knows nothing about your use case) and then you just have to hope that the scenario is “close enough” for what you need. If you provide your own data, then you are in control of the scenario, providing you with interactivity in your solution which translates to more transparency and likely better results.

This is what makes Picterra different from any other Machine Learning platform. You are in control of your own problem, your own solution, your own scenario.

So you might ask, don’t I need a lot of data to define my scenario? That really depends on your problem and in most cases, the answer is no. This is another major point of confusion with Machine Learning.

Your model doesn’t have to work on everything, it just have to work for your use case, and your use case alone.

Most people in the real world are not trying to solve the same problems as researchers in academia. They have a specific region or type of region where they are trying to detect something, they don’t actually need detection at global scale and so what they need is not a model that pretends to be trained on everything possible, what they need is one trained on their own regions of interest. Once you shrink the scope of the scenario, then the dataset required for a model to work becomes smaller, much much smaller. It’s a question of quality over quantity. Pure machine learning researchers don’t know the exact application their research could be used for since oftentimes their research may not be domain specific, so they can’t go for quality. They can only just use as much data as they can to cover as many possible cases in order to produce a convincing result. A real client with a real project just needs a small high quality dataset that relates to just what they need. And the one that can best define that dataset is themselves. We have quite a few use cases on the Picterra platform that have worked well with as few as 10 annotations!

This one click wonder approach fails to take advantage of one of the huge potentials of machine learning. Machine learning can take something as simple as some images and some annotations (or just drawings) on those images and create a solution that can be automated efficiently and at scale. But with a model that has already been trained for you without your own guidance this is a huge waste, because as simple as these images and annotations are, they might not be the images and annotations you need.

So how does this all translate to your time and money? For your project you can end up spending tens of thousands of dollars hiring a machine learning contractor to spend months on your project on data that may not even match your scenario. Or you can use the Picterra platform and get the job done yourself in a time scale of days (or even hours, depending on your use case) rather than months. And I guarantee you the platform will be cheaper for you than a consulting agency.

But doesn’t big data still help somehow?

Of course, academia has still proved one thing. More data helps. How does this relate to what we just said about the importance of guiding the model with a small number of annotations? You should think of it this way. It’s not that having more data enables a model to solve your specific problem. What it does do is it gives the model some extra background knowledge about your specific problem so that when you are teaching it, it’s not starting from nothing. This means that you can teach it even faster.

Let’s take the example of trying to detect cargo trucks in satellite imagery. In order for a machine learning to learn what a cargo truck looks like it needs to know things like:

  • it is a rectangular-ish object
  • it has wheels
  • it’s usually on a road.
  • it’s larger than other rectangular wheeled things on roads.

Well, vehicles in general have wheels, and are rectangular-ish and are usually on roads. What if your detector already learned the first 3 things. It doesn’t know what a cargo truck is yet, but it knows what vehicles are. Well then, the main thing it has to learn is that the cargo trucks are the things it already knows how to identify except that they are the larger ones. Because it already roughly understands what vehicles are this also means you have to do less work in teaching it what a cargo truck is!

This is the motivation behind a feature that we are working on called pre-trained base models. Base models are models that have already been trained on a large quantity of data that are categorized into groups of objects, like vehicles. As I said before, having pretrained detectors trained on data that isn’t yours should not be used to solve your problem. For this reason on the platform, we do not let you detect these objects using these base models on their own. There is no point in allowing you to do that as they have not been customised to your needs yet, and as I’ve previously said, they probably don’t do well at global scale anyways, they just have a rough idea of what “vehicles” look like.

Some pre-trained base models available on the Picterra platform

However, you can take these base models, and train your own customised detectors on top of them which allows you to create a detector for your own customised object / scenario with fewer annotations and better performance than if you were to start without the base model. Continuing the student analogy, if you’re trying to teach a student about algebra it’ll be easier if s/he already understands arithmetic.

At Picterra we’ll be working hard continually improve and add more of these pretrained models to the platform and one day, if these base detectors are powerful enough, maybe you’ll need such minimal guidance that you can draw 1 single annotation of an object and that’s all you’ll need to get your detector working! (but that’s just dreaming for now). Until then, these base detectors remain a powerful tool in the Picterra toolbox that can allow you to take advantage of the benefits of big data, while at the same time still letting you be the teacher of your machine.

www.picterra.ch

--

--