Introducing LakePy: Accessing Lake Water-Level Data Through a Python API

Published in

ESIP

5 min readJan 4, 2021

Instantly download, plot, and analyze decades of lake water-level history with this new open-source package.

Lake water level data are crucial to water resource management and continued scientific research into limnological questions. For the past decade, and increasingly so in the last few years, that data has been dutifully warehoused across federal, state, local, academic, and private databases. The issue with these data is access. These data are often behind registration walls, stored in text files on an old server, or are sparsely documented to make data retrieval an extended process.

LakePy and the back-end database it interfaces with — the Global Lake Level Database (GLLD) — serve to remedy this problem. The GLLD is a dedicated AWS RDS instance with an associated API (using API Gateway from AWS). The GLLD is comprised of three external databases (for now!):

These three databases together host 2000+ individual lakes with associated water level data extending from a few years to more than a century. In addition to many large lakes scattered across the globe, the GLLD contains every USGS-monitored lake in the United States. Furthermore, GLLD lake levels are updated weekly, giving end-users nearly real-time access to the data.

LakePy is a pythonic wrapper for the Global Lake Level Database API which serves both historical lake level data and associated metadata as well as providing simple plotting functions. In the next section, we will step through a tutorial of the package. If you would like to skip right to the documentation, you can find it here.

Step-by-Step Tutorial

This tutorial assumes you have installed Python 3.7 or higher with pip. To install LakePy, open the terminal/command prompt and enter

pip install lakepy

After install, move to your favorite IDE (I use Pycharm Professional, which is free for students!). Import LakePy as shown below, then we are ready to start searching for available lakes. For this tutorial, we will be using Lake Mead in Nevada, USA.

import lakepy as lkmead = lk.search(name='mead')

When you search using the name parameter, you are likely to come up with multiple results (but not always!). If there is more than one matching lake, you will see a table printed with the records returned (Note: you can specify markdown = True) in the search function to return markdown formatted tables).

   id_No    source                            lake_name
0    114  hydroweb                                 Mead
1   1519      usgs  MEAD LAKE WEST BAY NEAR WILLARD, WI

Two records were returned, meaning the function requires greater refinement of search parameters in order to choose a lake to download. We can do this one of two ways:

Specify a source. In this case, we would pass source= “hydroweb” as the other result (Mead Lake) is seemingly in Wisconsin!
Specify the GLLD ID number (id_No). This is the preferred method of access! Many lakes are named similarly, meaning the highest level of specificity one can provide is ideal when querying the GLLD.

We can now specify the id_No for the desired lake (114)

mead = lk.search(id_No = 114)

If successful, lake metadata will be printed and the variable “mead” will be created.

Lake Mead metadata from Theia’s HydroWeb Database

To confirm the process worked correctly, we can print the data type of variable “mead”. We can see it is indeed a LakePy Lake object.

In[0]: print(type(mead))
Out[0]:lakepy.main.Lake

The variable “mead” is now an object of the Lake class. This just means all the necessary data and functions will stem from this object (our variable, “mead”). If you are familiar with Pandas, DataFrames work in the same fashion.

The associated attributes of an object of class Lake are:

name
country
continent (currently not supported for HydroWeb)
source
original_id
id_No
observation_period
latitude (currently not supported for G-REALM)
longitude (currently not supported for G-REALM)
misc_data
metadata
data

We can use these to gain more information about Lake Mead, like the range of accessible data.

In[0]: print(mead.observation_period)
Out[0]: 2000-06-14 10:22 -- 2014-12-29 00:21

the “metadata” attribute will return all associated lake metadata as a Pandas DataFrame, which was also returned earlier.

The most important attribute is, of course, “data”. This attribute is a Pandas DataFrame with four columns: id_No, date, lake_name, and water_level.

Historical water level data for Lake Mead, NV, USA.

The fact that mead.data returns a Pandas DataFrame lets the end-user have immediate access to familiar methods and attributes. Simple statistics like generating the time-series median is easy and straightforward (In this example, I converted the data to a Pandas Time Series for ease-of-use).

In[1]:
ts = mead.data.filter(['date', 'water_level']).set_index('date')
ts.median()Out[1]: 
water_level    342.26
dtype: float64

With a little Pandas know-how we can even compute rolling means.

In[2]: ts.rolling(5).mean().tail()
Out[2]: 
            water_level
date                   
2014-06-15      335.536
2014-06-19      334.304
2014-09-11      332.954
2014-12-05      332.158
2014-12-29      331.968

There are two native plotting methods for Lake objects. The first is plot_timeseries(). This method can render a time-series plot using Plotly (default), Seaborn, or Matplotlib. The axis instance can be returned for more customization by setting the parameter show to False.

mead.plot_timeseries()

Lake Mead time-series rendered in browser using Plotly.

The second built-in method for plotting is plot_mapview(). This method utilizes GeoPandas and Contextily to plot a general overview map.

mead.plot_mapview()

Additional Contextily providers and zoom levels can be provided.

import contextily as ctx
mead.plot_mapview(provider = ctx.providers.Esri.WorldImagery, zoom = 10)

And that’s it! For more details, please visit our documentation. If you would like to contribute, visit our contributing guidelines.

Credits

This work is based on funding provided by the ESIP Lab with support from the National Aeronautics and Space Administration (NASA), National Oceanic and Atmospheric Administration (NOAA) and the United States Geologic Survey (USGS). Additional thanks to the University of Texas at Austin. UPDATE: LakePy received additional, generous support in 2021 from Derek Masaki and Farial Shahnaz. Many thanks to them!

Introducing LakePy: Accessing Lake Water-Level Data Through a Python API

Step-by-Step Tutorial

Credits

Written by Jake Gearon