Challenges and Opportunities of Big Satellite Data: Crop Classification and Mapping
By Nataliia Kussul
We live in rapidly changing world. One of the most important drivers of increasing our quality of life is space technology and big data, produced with space systems. It opens the door for new opportunities and gives us absolutely new vision of our life. And use of artificial intelligence for big data processing allows us not only to see but to understand what happens and even predict. I am happy to be involved into this tornado.
With availability since 2014 of large amount of “free and open” high-resolution Sentinel (1 and 2) data as well as Landsat-8 images, it becomes feasible to develop a lot of cloud services for fast access to satellite data and their products at high and medium spatial resolution scale. The first challenge was land use/land cover classification and mapping. We want not only to know, but to see how does the planet look we live on. While “static” land cover types are more or less easy to recognize even in a single image, agricultural land use monitoring is not trivial task. Due to availability of different data sources (as satellites as well as drones) it became possible to monitor vegetation state and health using different vegetation state indexes. It could be enough for small and medium farmers, who knows very well what variety of crops is growing at concrete field. But the things are different for big traders and companies, providing services for farmers. For example, agricultural insurance companies, providers of fertilizers need to know what crops are grown in their region of interest and what are the areas of these crops.
Inspired by Big data opportunities main players of agricultural market require not only data itself or some indexes, but “added value” products presenting information in their own ontology. Crop classification and crop specific mapping based on high resolution satellite imagery is one of the hottest issues nowadays. If at the end of vegetation season it is feasible to deliver crop specific maps with more or less acceptable accuracy (70–80%), early season crop classification remains a challenge. Let’s consider the main challenges at the way and some possible ways to overcome them.
Challenges
Crop classification and mapping is the most challenging task among the land use/land cover classification problems. At some extent it could be done if the diversity of crops is not very wide (for monocrop regions). Important precondition for successful classification are good enough training and validation data sets. In-situ data is the most crucial for classification accuracy. Modellers know that even for the perfect model, garbage in — garbage out. And incorrect in-situ data lead to incorrect classification. Intelligent methods of in-situ data checking are crucial for crop classification for really big areas.



On the Figure 1 a sample of potato field accidently set in the forest — these result to partial forest classification of forest as potato fields over the region of interest. Even several incorrect samples can lead to noticeable problems with classification results.
Clouds are the sword of Damocles for optical satellite data, especially for northern countries with rainy climate. It makes use of time series much more complicated. Even now, when Sentinel delivers images of each Earth location every 6 days, we cannot guarantee cloud-free image acquisition monthly. Especially difficult to get such imagery when they are needed most of all — in spring. For example, within SPOT-5 Take 5 Experiment it was really tricky to find cloud-free images in April-May, 2015 for Kiev region even with high enough spatial resolution. So SAR data can be a great solution for collecting complete time series over the region of interest.

Multitemporal imagery allows us not just to get a snapshot, but to observe “a process” of plant growth in its dynamics. That is why it is incredible data source for agricultural monitoring and crop classification and mapping. At the same time it requires much more computational resources and what is more dramatic — much more time for data transfer over the internet (downloading).

On the Figure 3 Normalized Difference Vegetation Index (NDVI) dynamics is shown. This parameter is also known as “vigor”. Brown areas mean open soil, light green — sparse or small vegetation and dark green — mature vegetation with high density.
Small scale benchmarks (applications for restricted areas) and methods tested on tiny test sites (20 by 20 km) are not scalable in many cases. What works for small test site does not work for large scale area. If you do not work with big satellite data, you will not believe me. Trusted machine learning techniques, such as Bayes classifier, Maximum likelihood methods, kernel algorithms like Support Vector Machine, overperforming others for small test sites, do not work for large scale problems, for instance at region level in Ukraine (corresponds to NUT2 administrative level in Europe) due to current limitation of cloud platforms. On the Figure 4 a comparison of supported and unsupported scales for some of available within GEE machine learning algorithms is shown — for Kiev region some of classifiers do not work but they still can be used for classification for smaller administrative units.

How to deal with this all?
Crop classification. Crop classification stands on three pillars: Data, Model (classifier), Experts. As well as traditional classifiers fail dealing with big satellite data, people look toward neural network models and deep learning techniques, which are extremely dependant on quality of reference data (in-situ for crop classification). One of possible data sources for in-situ data could be crowdsourcing. But here we face with typical data science problem — we must make sure the data to be correct. So we should clean the data and keep them consistent. Quite typical of non consistent data is illustrated on the pictures below.


Clouds. The silver bullet here is radar data. But it is another story… I’ll tell you about it next time.
Multitemporal imagery processing requires significant computational resources and wide bandwidth for data downloading. So it could be done efficiently only in cloud environment, like Google Earth Engine or Amazon, when you do not waste time for data downloading and the processing could be efficiently parallelized. European Space Agency also moves at similar direction. It envisages the development of the cloud infrastructure for Copernicus Data and Information Access Services Operations (DIAS). Such cloud-based infrastructure should be launched in operational way in early 2018.
Cloud platforms are also the solution for scaling the small area applications and methods. They allow to estimate the efficiency of known methods and to implement parallelization in most effective way. Nice example of such solution is LandViewer, which provides on-the-fly analysis and downloading of satellite data for the area of interest. LandViewer is really suitable for quick data accessing as well as for products validation at a glance.

So, things are changing very fast. And what was the problem yesterday becomes an opportunity tomorrow. I hope, crop classification and mapping will be done in nearest future for large scale areas with high resolution. Fusion of satellite technologies with high performance computations and artificial intelligence makes our dreams and challenges feasible.
