Towards urban flood susceptibility mapping using machine and deep learning models (part 2)

Omar Seleem
Hydroinformatics
Published in
4 min readDec 13, 2022

We introduced urban pluvial flooding and the application of data-driven models to map urban pluvial flooding in the last article. This series of articles summarize and explain (with python code) the paper “ Towards urban flood susceptibility mapping using data-driven models in Berlin, Germany” published in Geomatics, Natural Hazards and Risk. The complete Jupyter Notebook and a sample data for flooded and non flooded locations and the used predictive features in the paper are available here

Flood susceptibility maps show the likelihood that a specific location will experience flooding based on its physical characteristics, such as topography, infrastructure, and hydrometeorological conditions. Data-driven models approach it as a classification problem, i.e a location is either flooded or not flooded. Flooded and non-flooded locations are necessary to train, validate and test the data-driven models. A balanced dataset of equal number of flooded and not-flooded locations is necessary to train, validate and test the data-driven models. Flooded locations are obtained from flood inventories which include historical flood locations (see Fig. 1), while non-flooded locations are generated randomly in areas without flooding history. Data-driven models are then used to find a relationship between the characteristics of the flooded locations and flood occurrence.

Spatial distribution of 4333 reported flooded locations between 2005 –2017 and administrative districts in Berlin, Germany.
Fig 1. Spatial distribution of 4333 reported flooded locations between 2005 –2017 and administrative districts in Berlin, Germany (The locations are obtained from Berlin Wasser Betrieb).

Urban pluvial flooding is expected to occur in low-elevated areas or within topographic depressions where runoff water tends to accumulate. Therefore topographic features such as altitude, slope, curvature and topographic wetness index (TWI) potentially indicate an increased flood hazard. Furthermore, urban pluvial flooding occurs due to the exceedance of the stormwater drainage system capacity. Excess runoff travels through the road network converting it to a preferential path. Therefore, the distance between a location and the nearest road, gully and channel are considered important to map flood susceptibility. Finally, urban pluvial flood is caused by intensive rainfall events. Thus, rainfall should be also included as a predictive feature. We used the maximum daily rainfall depth and the frequency of extreme rainfall events to represent the rainfall. Fig 2 shows the considered predictive features. These predictive features can be calculated in Arcmap, QGIS or python.

The paper compared the convolutioanl neural network (CNN)(raster-based model ) with traditional machine learning algorithms such as random forest (RF) and support vector machine (SVM)and artificial neural networks (ANN) (point-based models ). Our hypothesis was that the CNN is superior to other models.

Fig 2. Spatial distribution of flood predictive features used to develop the models.

This article explains how to prepare the data for point-based models. We can read the shapefile containing flood and non-flooded points in python using geopandas. Then we need to add an empty column for each predictive feature.

# Import used packages
import geopandas as gpd # used to read the shapfile
import rasterio as rio # used to read the raster (.tif) files
from rasterio.plot import show # used to make plots using rasterio
import matplotlib.pyplot as plt #to make plots using matplotlib

# Read your point shapefiles (Flooded and Non Flooded locations)
points=gpd.read_file('Points.shp')

# make columns to extract the values of each predictive feature
# for each point.
points['DEM']=0 #
points['Slope']=0
points['Aspect']=0
points['Curvature']=0
points['TWI']=0
points['DTDrainage']=0
points['DTRoad']=0
points['DTRiver']=0
points['CN']=0
points['AP']=0 #Max daily precipitation
points['FP']=0 #Frequency of extreme precipitation event.

Then, we need to open the predictive features (raster images) using rasterio or gdal and read them as NumPy arrays so we can work with them in python.

#The predictive features are in raster format so we use rasterio package to 
#read them and convert them to numpy array

DEM_raster=rio.open('DEM.tif')
DEM_arr=DEM_raster.read(1)

Slope_raster=rio.open('Slope.tif')
Slope_arr=Slope_raster.read(1)

Aspect_raster=rio.open('Aspect.tif')
Aspect_arr=Aspect_raster.read(1)

Curvature_raster=rio.open('Curvature.tif')
Curvature_arr=Curvature_raster.read(1)

TWI_raster=rio.open('TWI.tif')
TWI_arr=TWI_raster.read(1)

DTRoad_raster=rio.open('DTRoad.tif')
DTRoad_arr=DTRoad_raster.read(1)

DTRiver_raster=rio.open('DTRiver.tif')
DTRiver_arr=DTRiver_raster.read(1)

DTDrainage_raster=rio.open('DTDrainage.tif')
DTDrainage_arr=DTDrainage_raster.read(1)

CN_raster=rio.open('CN.tif')
CN_arr=CN_raster.read(1)

AP_raster=rio.open('AP.tif')
AP_arr=AP_raster.read(1)

FP_raster=rio.open('FP.tif')
FP_arr=FP_raster.read(1)

#show point and raster on a matplotlib plot
fig, ax = plt.subplots(figsize=(12,12))
points.plot(ax=ax, color='orangered')
show(DEM_raster, ax=ax)

Now we have read the points as a dataframe using geopandas and opened the predictive features using rasterio and read them as nump arrays. Next, we need to extract the predictive feature values for the flooded and non-flooded points.

# Extracting the raster values to the points shapefile
# count=0
for index,row in points.iterrows(): #iterate over the points in the shapefile
longitude=row['geometry'].x #get the longitude of the point
latitude=row['geometry'].y #get the latitude of the point

#print("Longitude="+str(longitude))
#print(count)
#count +=1

rowIndex, colIndex=DEM_raster.index(longitude,latitude) # the corresponding pixel to the point (longitude,latitude)

# Extract the raster values at the point location
points['DEM'].loc[index]=DEM_arr[rowIndex, colIndex]
points['Slope'].loc[index]=Slope_arr[rowIndex, colIndex]
points['Aspect'].loc[index]=Aspect_arr[rowIndex, colIndex]
points['Curvature'].loc[index]=Curvature_arr[rowIndex, colIndex]
points['DTRoad'].loc[index]=DTRoad_arr[rowIndex, colIndex]
points['DTRiver'].loc[index]=DTRiver_arr[rowIndex, colIndex]
points['DTDrainage'].loc[index]=DTDrainage_arr[rowIndex, colIndex]
points['TWI'].loc[index]=TWI_arr[rowIndex, colIndex]
points['CN'].loc[index]=CN_arr[rowIndex, colIndex]
points['AP'].loc[index]=AP_arr[rowIndex, colIndex]
points['FP'].loc[index]=FP_arr[rowIndex, colIndex]

points.head() # to have a look on the calculated fields.

# Save the points file
points.to_file('points_data.shp') # save as a shapfile
# or
points.to_pickle('points_data.pkl') # save as a pickle.

Now we have a points shapefile which specifies the location of flooded and non-flooded locations and includes the predictive feature values at each point. In the next article, we will explain how to implement point-based models to map urban pluvial flood susceptibility.

--

--

Omar Seleem
Hydroinformatics

Dr. -Ing | Hydrology | Data scientist | Machine learning