Explore Corine Land Cover dataset

Jan Tschada
Geospatial Intelligence
7 min readMay 19, 2023

The Corine Land Cover dataset is a valuable resource for anyone interested in spatial data science. This dataset provides a comprehensive classification of land use and land cover across Europe, which can be used for a wide range of applications, from environmental monitoring to urban planning. In this blog post, we will explore what the Corine Land Cover dataset is, how to access it, and how it can be used for spatial data science.

What is the Corine Land Cover dataset?

The Corine Land Cover dataset is a comprehensive classification of land use and land cover in Europe. It was created using satellite imagery and provides detailed information on various land cover features, including forests, grasslands, wetlands, urban areas, and water bodies. The dataset is produced by the European Environment Agency and is updated every six years, providing a valuable resource for long-term monitoring of changes in land use and land cover across Europe.

Accessing the Corine Land Cover dataset

The Corine Land Cover dataset is freely available from the European Environment Agency’s website. It is provided as a set of raster files in GeoTIFF format, which can be imported into any GIS software. The data is organized into tiles, each covering a specific region of Europe, making it easy to download and work with specific areas of interest.

Using the Corine Land Cover dataset for spatial data science

The Corine Land Cover dataset is a valuable resource for spatial data science, providing detailed information on land use and land cover across Europe. Here are a few examples of how this dataset can be used for spatial data science:

1. Environmental monitoring
The Corine Land Cover dataset can be used to monitor changes in land use and land cover over time, providing valuable insights into the impact of human activities on the environment. For example, the dataset can be used to monitor changes in forest cover, wetlands, and other important habitats, helping to identify areas that require conservation efforts.

2. Urban planning
The Corine Land Cover dataset can be used to inform urban planning decisions, providing valuable insights into the distribution of urban areas and their impact on surrounding ecosystems. For example, the dataset can be used to identify areas of high urban density, helping to inform decisions on the placement of new development and the preservation of green spaces.

3. Natural resource management
The Corine Land Cover dataset can be used to inform natural resource management decisions, providing valuable insights into the distribution of natural resources across Europe. For example, the dataset can be used to identify areas of high agricultural activity, helping to inform decisions on the allocation of agricultural subsidies and the regulation of farming practices.

Spatial data science with the Corine Land Cover dataset

To perform spatial data science with the Corine Land Cover dataset, there are a few key steps to follow:

1. Data preparation
The Corine Land Cover dataset is provided as a set of raster files, which can be imported into any GIS software. Before analysis, it is important to preprocess the data, including converting the raster data to a vector format and cleaning the data to remove any errors or inconsistencies.

2. Data analysis
Once the data is prepared, a wide range of spatial data science techniques can be applied to the data, including clustering, regression analysis, and machine learning algorithms. For example, clustering can be used to identify areas of high urban density, while regression analysis can be used to identify the factors that are driving changes in forest cover.

3. Visualization and communication
Finally, it is important to visualize and communicate the results of the analysis in a clear and accessible way. This could include creating interactive maps, data visualizations, and reports that summarize the key findings.

The Geospatial Land Cover API Service

For easy access, we created a dedicated API offering the land cover classifications for locations in Europe. The service classifies locations into 44 well-known land cover categories of the Corine database.

An array of latitude and longitude coordinate pairs represents the locations. By design, the service has a limit of 100 coordinate pairs for each request made. The result is a two-dimensional matrix containing the unique category ID and name for every coordinate pair.

Let us assume we put our dedicated API key into an environment variable. We just need to create a utility function wrapping the request parameters like service endpoint, header and payload. The function receives two arrays representing the locations as coordinates.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0
import requests
import os
import pandas as pd


def classify_landcover(latitudes, longitudes):
"""
Classifies the specified coordinates using the 44 well-known land cover categories.
Returns a two-dimensional matrix containing the unique category IDs and names.
[
[121, 312, 312, 112, 512, 112, 423],
['Industrial or commercial units',
'Coniferous forest',
'Coniferous forest',
'Discontinous urban fabric',
'Water bodies',
'Discontinous urban fabric',
'Intertidal flats']
]
"""
url = "https://geolandcover.p.rapidapi.com/classify"
payload = {
'lat': latitudes,
'lon': longitudes
}
api_key = os.environ.get('x_rapidapi_key')
headers = {
'content-type': 'application/json',
'X-RapidAPI-Key': api_key,
'X-RapidAPI-Host': 'geolandcover.p.rapidapi.com'
}
result = requests.post(url, json=payload, headers=headers)
result.raise_for_status()
return result.json()

For a simple smoke test, we define some named locations and convert those into a structured data frame.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0
named_locations = [
{
'name': 'Junkers Dessau',
'lat': 51.822,
'lon': 12.184
},
{
'name': 'Restricted Area Bad Düben',
'lat': 51.612,
'lon': 12.637
},
{
'name': 'Presseler Teich',
'lat': 51.579,
'lon': 12.733
},
{
'name': 'Urban Dahlen',
'lat': 51.362,
'lon': 12.996
},
{
'name': 'Cospudener See',
'lat': 51.269,
'lon': 12.335
},
{
'name': 'Venusberg',
'lat': 50.707,
'lon': 7.093
},
{
'name': 'Nordstrand Norderney',
'lat': 53.723,
'lon': 7.266
}
]

named_locations_df = pd.DataFrame.from_dict(named_locations)
Named locations

We need to classify the named locations using the land cover service endpoint. The service returns for every location, the land cover ID as an integer and the corresponding category name. So that we can easily extend our data frame with two columns, named class and category. We copy the values of the latitude and longitude column into two lists and use our defined utility function, wrapping the service request.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0
latitudes = named_locations_df['lat'].values.tolist()
longitudes = named_locations_df['lon'].values.tolist()
classifications = classify_landcover(latitudes, longitudes)
named_locations_df['class'] = classifications[0]
named_locations_df['category'] = classifications[1]
Classified named locations

Here, we have every location classified using one of the 44 well-know land cover categories. The third named location represents a small lake being surrounded by a forest. The classification model uses a minimum mapping unit of 25 ha. If we inspect the named location and use the measurement tool, we recognize this lake has an area of only about 2.8 ha.

Pressler Teich, © OpenStreetMap contributors

With the help of this service, we can enrich our spatial data science workflows using land cover classification of named locations in Europe out-of-the box. We only showed how to enrich locations, but not what kind of intelligence value we offer to domain specific models.

Conclusion

The Corine Land Cover dataset is a valuable resource for anyone interested in spatial data science. This dataset provides detailed information on land use and land cover. Data scientists have access to an enormous volume of location-enabled data, and do not have the time, not the domain-specific GIS skills, or the professional GIS tools preparing highly GIS-savvy raster and vector datasets for their data science workflows.

The introduced geospatial land cover API service represents a proof-of-concept implementation. Currently, it covers only European countries, and each classification request has a maximum limit of 100 locations. If you want to classify over 100 locations, you need to chunk your requests.

# author: Jan Tschada
# SPDX-License-Identifer: Apache-2.0


def classify_landcover_chunked(latitudes, longitudes):
"""
Classifies the specified coordinates using the 44 well-known land cover categories in chunks of 100 coordinate pairs.
Use this approach when your coordinates exceed the maximum limit of 100 locations.
Returns a two-dimensional matrix containing the unique category IDs and names.
"""
chunksize = 100
coordinates_count = min(len(latitudes), len(longitudes))
result = [
[],
[]
]
for start_index in range(0, coordinates_count, chunksize):
end_index = start_index + chunksize
current_result = classify_landcover(latitudes[start_index:end_index], longitudes[start_index:end_index])
result[0].extend(current_result[0])
result[1].extend(current_result[1])

return result

For the best performance, a spatial grid aggregates the land cover categories of the Corine database. So that the service classifies the locations against a minimum bounding geospatial region.

We created the spatial grid using the full, open and free accessible Corine Land Cover data. This data is produced with funding by the European Union. The detailed licensing conditions can be found here.
© European Union, Copernicus Land Monitoring Service 2018, European Environment Agency (EEA)

Any feedback is welcome. We are going to show some guides how to integrate this service during the data preparation, data analysis, and the visualization and communication step for common spatial data science workflows.

References:

[1] geolandcover API
Classify locations into well-known land cover categories.

[2] Corine Land Cover data
Corine Land Cover products are available in both raster (100 resolution), and vector (ESRI and SQLite geodatabase).

--

--