Pre-processing the DEM of France (RGE ALTI® 5m) for implementation into Earth Engine

Guillaume Attard
9 min readJun 6, 2022

--

Authors: Guillaume Attard & Julien Bardonnet

france-dem-large

Introduction

A Digital Elevation Model (DEM) is a description of the ground elevation. DEMs are used in many fields such as hydrology, agriculture, and natural risks analysis. Since few years, some high resolution DEMs have become open and freely available. Some of them are available on the Earth Engine data catalog, but only few are available at a very high resolution. And when it comes to analyse the human and financial consequences of natural disasters such as floods and sea level rise, this high resolution is needed.

Hence, the aim of this article is to detail the implementation steps of the high resolution France DEM (RGE ALTI® 5m — IGN) into Earth Engine. In the following, the procedure is applied on the southern part of Corsica (Department 2A) but can be easily transposed to the all country.

The structure of the article reads as follow:

  • the dataset (RGE ALTI® 5m) is described,
  • a procedure to automate the download of the DEM of multiple departments of France is developed from the FTP server,
  • the DEM asc files are turned into geoTIFF files and merged into one,
  • then, geoTIFF files are uploaded as new assets in Earth Engine.

The final result has been published on Awsome Google Earth Engine Community Datasets.

Run me first

The following librairies are required:

import os
import shutil

from ftplib import FTP

import py7zr

import pandas as pd
import numpy as np

import rasterio
from rasterio.merge import merge
from rasterio.plot import show

from osgeo import gdal, osr

import pprint
from tqdm.notebook import tqdm

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
%matplotlib inline
# Prepare data folder
DATAPATH = 'data'
os.makedirs(DATAPATH, exist_ok=True)

Description of the RGE ALTI® 5m dataset

The RGE ALTI® 5m dataset describes the ground elevation of France with a spatial resolution of 5x5m. It is produced by the National Institute of Geographic and Forest Information (IGN — https://www.ign.fr/) which is a public administrative establishment placed under the joint authority of the Ministries in charge of ecology and forestry. The full dataset description is available here.

The accuracy of the DEM varies according to areas and issues: it is increased to 20 cm in floodplain areas or coastal areas. The RGE ALTI® is updated from surveys obtained by airborne LIDAR or by correlation of aerial images. The dataset can be downloaded at a 1x1m resolution or at a 5x5m resolution.

This dataset is published under the open licence Etalab.

Dataset accessibility — automate download and extraction

The RGE ALTI® 5m dataset is split in multiple 7-Zip files, and all individual files are available here. These files are stored on a FTP server ftp3.ign.fr with a public username/password.

Then, a procedure is given below to get the info and download files stored on the FTP associated with the dataset:

# Define requested info to access the FTP.
URL = "ftp3.ign.fr"
USERNAME = "RGE_ALTI_ext"
PASSWORD = "Thae5eerohsei8ve"

# Create a small helper class to manipulate our ftp connection
# without leaving the connection open between calls
class FTPHelper:
def __init__(self, url, username, password):
self.url = url
self.username = username
self.password = password

def list_files(self):
"""
Connects to ftp server and returns a list of file names stored on it.
"""
with FTP(self.url) as ftp:
ftp.login(self.username, self.password)
filenames = ftp.nlst()
return filenames


def download_file(self, filename, local_folder, verbose=True):
"""
Connects to ftp server and downloads remote `filename` to local `local_folder`.

Disable verbose to avoid printing the progress bar.
"""
# Create the directory to store the downloaded file.
os.makedirs(local_folder, exist_ok = True)
# Define the output filename.
output_filepath = os.path.join(local_folder, filename)
# Download the file (big files so ze show a progress bar)
with FTP(self.url) as ftp:
ftp.login(self.username, self.password)
filesize = ftp.size(filename)
with open(output_filepath, 'wb') as f:
with tqdm(total=filesize,
unit='B', unit_scale=True, unit_divisor=1024,
disable=not verbose) as pbar:
pbar.set_description(f"Downloading {filename}")
def callback_(data):
l = len(data)
pbar.update(l)
f.write(data)
ftp.retrbinary('RETR ' + filename, callback_)
print(f"{filename} is downloaded.")
return output_filepath

# Instantiate helper
ftp_h = FTPHelper(URL, USERNAME, PASSWORD)

The FTPHelper.list_files() function is used to store all files info into a pandas dataframe.

# Create a pandas.df to store the list of available files.
df = pd.DataFrame(data=ftp_h.list_files(), columns=["name"])

# Rearrange the content of the pandas df.
df = pd.concat([df, df["name"].str.split("_", expand=True)], axis=1)
df = df.rename(columns = {
0:"dataset", 1:"version", 2:"res", 3:"filetype", 4:"crs",
5:"dep", 6:"date"})

# Loc the dataset of interest RGE Alti 5m.
df = df.loc[(df.dataset == "RGEALTI") & (df.res == "5M")]

# Clean the date and set the name as index.
df['date'] = df['date'].apply(lambda x:x[:-3])
df = df.set_index("name")

df.head()
png

In the following we define a department of interest corresponding to the southern part of Corsica (France), and we print the associated file:

deps_of_interest = ['D02A']

# Loc lines of interest in the pandas.df.
dfi = df.loc[df.dep.isin(deps_of_interest)]
dfi
png

The file of interest is downloaded using our previously define function FTPHelper.download_file(). Please note that the server can be very slow.

filename = dfi.index[0]

# Define the name (global variable) of the temp directory to store zip files.
TEMP_ZIP = os.path.join(DATAPATH, "temp7z")

# Comment after running to no download again.
ftp_h.download_file(filename, TEMP_ZIP)

Then we need to extract the content of the downloaded file. A procedure is defined accordingly:

# Define the name (global variable) of the temp directory to store extracted files.
EXTRACTION_PATH = os.path.join(DATAPATH, "RGE_ALTI_5m")

def extract_rge5m(filename):
"""
Extracts the zip DEM.
"""
zipfile_path = os.path.join(TEMP_ZIP, filename)
os.makedirs(EXTRACTION_PATH, exist_ok = True)
with py7zr.SevenZipFile(zipfile_path, mode='r') as archive:
archive.extractall(path=EXTRACTION_PATH) #WIN: check the path lenght in case of error FileNotFound
print(f"{zipfile_path} is extracted to {EXTRACTION_PATH}.")

The function is applied to our filename of interest.

extract_rge5m(filename)

Now the unzip folder should look like this:

RGE_ALTI_5m
└───RGEALTI_2-0_5M_ASC_LAMB93-IGN78C_D02A_2020-04-16
└───RGEALTI
│ LISEZ-MOI.txt
└───1_DONNEES_LIVRAISON_2021-06-00061
│ │ RGEALTI_MNT_5M_ASC_LAMB93_IGN78C_D02A.md5
│ └───RGEALTI_MNT_5M_ASC_LAMB93_IGN78C_D02A
│ │ RGEALTI_FXX_1155_6135_MNT_LAMB93_IGN78C.asc
│ │ RGEALTI_FXX_1155_6140_MNT_LAMB93_IGN78C.asc
│ │ RGEALTI_FXX_1155_6145_MNT_LAMB93_IGN78C.asc
│ │ ...
└───2_METADONNEES_LIVRAISON_2021-06-00061
└───3_SUPPLEMENTS_LIVRAISON_2021-06-00061

The folder/subfolder names depend on the acquisition period of the dataset, and on the department of interest. Hence, a procedure is given to go straight to the path where the asc files of the DEM are located. We store all asc files into a list. We finally print the 3 first asc paths of the list.

def get_path_asc_paths(filename):
"""
This funciton returns a list of paths of asc files.
"""
for root, dirs, files in os.walk(EXTRACTION_PATH, filename[:-3]):
if "1_DONNEES_LIVRAISON" in root:
asc_paths_list = sorted([os.path.join(root, name) for name in files if name.endswith(".asc")])

return asc_paths_list

asc_paths_list = get_path_asc_paths(filename)
pprint.pprint(asc_paths_list[:3])

Let’s have a look at a DEM piece (.asc file) located in this folder. The 10th element is selected and the header of the file (6 first lines) is printed.

# Select one asc file in the DEM directory.
sample_path = asc_paths_list[10]

# Open and print the 10 first lines.
with open(sample_path) as file:
content = file.readlines()
header = content[:6]
data = content[6:]

pprint.pprint(header)

As we can see, the header of the file gives us:

  • ncols the number of columns,
  • nrows the number of rows,
  • xllcorner the x location (in epsg 2154) of the lower left corner of the DEM,
  • yllcorner the y location (in epsg 2154) of the lower left corner of the DEM,
  • cellsize the cell size in meters,
  • NODATA_value the nodata value.

On the other hand, the rest of the content represents the elevation data. The 1000 first characters of the data content is printed as follow:

pprint.pprint(data[0][:1000])

We define a function to store these data into a dictionary. For further manipulations, it will be convenient to define a class associated with RGE pieces. Hence, we also define such a class in which the DEM data is loaded using the np.loadtxt() function.

def get_header_asc(filepath):
"""
This function reads the header of an asc file and returns
the data into a dictionnary
"""
file = open(filepath)
content = file.readlines()[:6]
content = [item.split() for item in content]
return dict(content)

class RGEitem():
"""
This class is used to handle RGE items.
"""
def __init__(self, filepath):
self.filename = os.path.basename(filepath)
self.dir = os.path.dirname(filepath)
self.data = np.loadtxt(filepath, skiprows=6)
self.header = get_header_asc(filepath)
self.ncols = int(self.header['ncols'])
self.nrows = int(self.header['nrows'])
self.xllc = float(self.header['xllcorner'])
self.yllc = float(self.header['yllcorner'])
self.res = float(self.header['cellsize'])
self.zmin = float(self.data.min())
self.zmax = float(self.data.max())
self.novalue = -99999.

Let’s have a look at our sample item:

file = RGEitem(sample_path)

plt.rcParams["figure.figsize"] = (7,7)
plt.imshow(file.data, vmin=0, vmax=300)
plt.show()

print("The max elevation is", file.zmax,"m.")
png
The max elevation is 619.01 m.

Turn .asc files into geoTIFF files

Now that we see how to download and access these data, we can transform each .asc file into a geoTIFF using the osgeo library. We do this as follow:

def asc_to_tif(file, output_raster_dir, epsg):
"""
Transforms an .asc file into a geoTIFF.

Params:
-------
file: an RGEitem
output_raster_dir (str): path to the directory where the tif will be saved.
epsg (int): projection system.

Returns:
--------
output_rasterpath (str): name of the output geoTIFF
"""
xmin = file.xllc
ymax = file.yllc + file.nrows * file.res
geotransform = (xmin, file.res, 0, ymax, 0, -file.res)

output_rasterpath = os.path.join(output_raster_dir, file.filename[:-4] + ".tif")

# Open the file
output_raster = gdal.GetDriverByName('GTiff').Create(output_rasterpath, file.ncols, file.nrows, 1, gdal.GDT_Float32)
# Specify the coordinates.
output_raster.SetGeoTransform(geotransform)
# Establish the coordinate encoding.
srs = osr.SpatialReference()
# Specify the projection.
srs.ImportFromEPSG(epsg)
# Export the coordinate system to the file.
output_raster.SetProjection(srs.ExportToWkt())
# Writes the array.
output_raster.GetRasterBand(1).WriteArray(file.data)
# Set nodata value.
output_raster.GetRasterBand(1).SetNoDataValue(file.novalue)
output_raster.FlushCache()
return output_rasterpath

We apply the function to get the geoTIFF associated with our sample and we open it using rasterio.

# Generate the tif.
SAMPLEPATH = os.path.join(DATAPATH, "sample")
os.makedirs(SAMPLEPATH, exist_ok = True)
tiffile = asc_to_tif(file, SAMPLEPATH, 2154)

with rasterio.open(tiffile) as img:
show(img)
plt.show()
png

Note that x and y axes are now in line with the coordinate reference system we are using (epsg:2154).

Merge geoTIFF files into a mosaic

We can now define a procedure to trnasform each .asc file into .tif and create a mosaic with all these files following the procedure described by Abdishakur.

def create_rge_mosaic(asc_paths_list, result_path, mosaic_name, crs):
"""
Creates a mosaic associated to multiple asc files.

Params:
-------
asc_paths_list (list): list of asc files paths.
mosaic_path (str): path of the output mosaic.
crs (int): coordinate reference system (ex. 2154 for EPSG:2154).
"""

# Create tmp dir to save intermediate tifs.
tmpdir = os.path.join(DATAPATH, 'local_tifs')
os.makedirs(tmpdir, exist_ok = True)

output_raster_paths_list = [asc_to_tif(RGEitem(ascpath), tmpdir, crs) for ascpath in asc_paths_list]

# Safely load rasters and create the mosaic.
from contextlib import ExitStack
with ExitStack() as stack:
raster_to_mosaic_list = [stack.enter_context(rasterio.open(path))
for path in output_raster_paths_list
]
mosaic, output = merge(raster_to_mosaic_list)
output_meta = raster_to_mosaic_list[0].meta.copy()

output_meta.update({
"driver": "GTiff",
"height": mosaic.shape[1],
"width": mosaic.shape[2],
"transform": output,
})

# Save the result.
os.makedirs(result_path, exist_ok=True)
mosaic_path = os.path.join(result_path, mosaic_name)
with rasterio.open(mosaic_path, "w", **output_meta) as m:
m.write(mosaic)

# Purge the tmp dir.
shutil.rmtree(tmpdir)

return mosaic_path

The function is applied to create the mosaic of the southern part of Corsica (department 2A). The procedure an take few minutes and the final resulting file is about 1.4 GB:

result_path = os.path.join(DATAPATH, "result")
mosaic_name = "mymosaic2A_2022.tif"

mosaic_path = create_rge_mosaic(asc_paths_list, result_path, mosaic_name, 2154)

The result can be displayed as follows. On the following figure, the initial sample area is delineated with a red rectangle.

fig, ax = plt.subplots(1, figsize=(12, 12))

# Indicate the location of the sample tiff with a red rectangle.
x1, y1 = file.xllc, file.yllc
dx, dy = file.ncols*file.res, file.nrows*file.res
ax.add_patch(Rectangle((x1, y1), dx, dy, edgecolor='red', fill=False))

# PLot the mosaic of the 2A departement.
with rasterio.open(mosaic_path) as img:
show(img, interpolation='none', ax=ax)

plt.title('Elevation of the southern part of Corsica (2A)')
plt.show()
png

Group mosaics and upload in Earth Engine as new assets

The procedure described in this notebook has been applied for all departments of France (Continental). Then, the resulting geoTIFFs have been uploaded into Google Earth Engine as new assets. The procedure to manage an assets in Earth Engine is described here.

The final result has been published on Awsome Google Earth Engine Community Datasets.

AGCD - Samapriya Roy

The Earth Engine snippet reads as follows:

var rge_alti5 = ee.Image("projects/sat-io/open-datasets/IGN_RGE_Alti_5m");

Sample Code: https://code.earthengine.google.com/b9054b6d5c4d7de1a4af39ff307e73d7

Acknowledgements

Thanks to Samapriya Roy for final merging of the dataset into Earth Engine, for publication on the AGCD, and for the production of the GIF used before intro.

Note

You can find this notebook and many others on my GitHub geoscience-notebooks repository or on my website.

--

--

Guillaume Attard

CEO & cofounder @Ageoce. I am a geoscientist fascinated by new technologies and geodata science. I am a Google Developer Expert for Earth Engine. ageoce.com