Machine learning-based time-series prediction for rainfall-runoff modeling

Kadir Polat
4 min readApr 6, 2022

Written by

& (March 2022)

This blog post introduces the usage of machine learning methods for rainfall runoff modeling as an alternative to deterministic approaches. We take up the findings of a research course at Bochum University of Applied Sciences, which covered use cases of artificial intelligence for hydrological science and built on the studies of Kratzert et al.[1]

An in-depth knowledge of hydrological and runoff processes is not necessary for this approach compared to deterministic modeling as the machine learning approach solely relies on observational data. To be accurate, the observed data of a specific timescale allows the prediction of runoff. The correlation between rainfall and discharge within a catchment area can be deduced by training a model on historical hydro-meteorological timeseries data. Two supervised learning procedures were used for the modeling. More specific, standard long short-term memory (LSTM) networks are compared to a convolutional neuronal network that is combined with a LSTM network (CNN-LSTM). By using a combined CNN-LSTM network we aim to achieve a more precise feature extraction compared to the standard LSTM network, which should ideally outperform the standard one.

For evaluating our experimental setups, we used two hydro-meteorological data sources. One being the one-dimensional CAMELS-US dataset[2] for training and evaluating the standard LSTM network, and the other one being the two-dimensional Daymet US dataset[3] for training and evaluating the combined CNN-LSTM network. To ensure a successful comparison of the two network architectures, we selected a small subset of CAMELS-US catchments from US-American mainland’s divers climate zones as area of research. Therefore, each streamflow of a catchment area was predicted once via the standard LSTM and once via the CNN-LSTM. Eureka, California was chosen as example of a humid climate, whereas Seattle, Washington and Mesa, Arizona were selected for a moderate and arid climate. All locations are marked in the following US map.

In beforehand to the modeling itself the two datasets had to undergo several steps of preprocessing in order to guarantee their comparability. First of all, datasets w-ere normalized in a range between 0 and 1(min-max scaling) since the different meteorological parameters shows varying levels of measurement. Afterwards the training samples were shuffled to prevent dependencies e.g., due to seasonal discharge behavior. However, the inner block of a timeseries must not be shuffled to retain the time dependencies which should be examined. Each city’s datasets compassed the years from 1980 to 1990. The first 8 years (80%) were used for training data whereas the last two years were used for evaluating and testing data. The models were trained on the preceded last 30 days concerning rainfall, minimal and maximal temperature, shortwave radiation and vapor pressure aiming to determine the discharge on day 30+1.

To compare the outcome of each result we calculated the Nash–Sutcliffe model efficiency coefficient (NSE) measure.

Figure 1: Nash–Sutcliffe model efficiency coefficient [4]

Results

The results of the modeling illustrate a distinctly higher accuracy of the combined CNN-LSTM network in comparison to the standard LSTM network. Regarding two of the three basins consulted for this modeling, significantly more successful outcomes were achieved. Moreover, both network architectures were able to obtain highly appropriate and representative outcomes for the respective basins. Especially, Seattle performed very well regarding the NSE and comparison of real and predicted streamflow for both architectures, which is depicted in figure 2 and 3.

Figure 2: NSE of Seattle
Figure 3: Comparison of real and predicted streamflow

Conclusion

The presented experimental study project aimed to be a proof-of-concept regarding the application of spatial distributed raster datasets meteorological for rainfall-runoff modelling. This has been addressed by comparing a standard LSTM networks, which was trained on lumped timeseries data, with a combined CNN-LSTM network, which was trained on spatial distributed timeseries data. Based on the NSE criterion, significant differences could be observed regarding two of the three basins. These two models are generally fitting for a successful runoff modeling. However, the CNN-LSTM model showed a higher precision based on the selected analysis. Although this model requires a higher expenditure of time and computing capacity, its architecture provides a promising approach for achieving higher quality and precision in the context of machine learning based rainfall-runoff modeling. Moreover, the model possesses potential for improvements regarding each of the basins. This potential arises from the improvement of the respective network architectures, which were uniformly selected for the presented three basins in this trial.

See our results and reproduce them by easily downloading our code on the following repository at GitHub:

https://github.com/KadPol/Niederschlagsimulierung

https://github.com/HTR018/Niederschlagsimulierung

[1] See: European Geosciences Union: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks. https://hess.copernicus.org/articles/22/6005/2018/ (Last recall: 12/18/2021).

[2] See: NCAR Research Applications Laboratory: Camels Catchment atributes and meteorology for large-sample studies — Dataset download. https://ral.ucar.edu/solutions/products/camels (Last recall: 12/18/2021).

[3] See: ORNL DAAC: Daymet: Daily surface weather data on a 1-km grid for north america, version 4. https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1840 (Last recall: 12/18/2021).

[4] Hydrology and Earth System Sciences: Addor, Nans, Newman, Andrew J., Mizukami Naoki und Clark, Martyn P.: The CAMELS data set. Catchment attributes and meteorology for large-sample studies. https://hess.copernicus.org/articles/21/5293/2017/hess-21-5293- 2017.pdf (Last recall: 25.08.2021).

--

--