Introducing LakePy: Accessing Lake Water-Level Data Through a Python API
Instantly download, plot, and analyze decades of lake water-level history with this new open-source package.
Lake water level data are crucial to water resource management and continued scientific research into limnological questions. For the past decade, and increasingly so in the last few years, that data has been dutifully warehoused across federal, state, local, academic, and private databases. The issue with these data is access. These data are often behind registration walls, stored in text files on an old server, or are sparsely documented to make data retrieval an extended process.
LakePy and the back-end database it interfaces with — the Global Lake Level Database (GLLD) — serve to remedy this problem. The GLLD is a dedicated AWS RDS instance with an associated API (using API Gateway from AWS). The GLLD is comprised of three external databases (for now!):
- United States Geological Survey National Water Information System
- United States Department of Agriculture: Foreign Agricultural Service’s G-REALM Database
- Theia’s HydroWeb Database
These three databases together host 2000+ individual lakes with associated water level data extending from a few years to more than a century. In addition to many large lakes scattered across the globe, the GLLD contains every USGS-monitored lake in the United States. Furthermore, GLLD lake levels are updated weekly, giving end-users nearly real-time access to the data.
LakePy is a pythonic wrapper for the Global Lake Level Database API which serves both historical lake level data and associated metadata as well as providing simple plotting functions. In the next section, we will step through a tutorial of the package. If you would like to skip right to the documentation, you can find it here.
This tutorial assumes you have installed Python 3.7 or higher with pip. To install LakePy, open the terminal/command prompt and enter
pip install lakepy
After install, move to your favorite IDE (I use Pycharm Professional, which is free for students!). Import LakePy as shown below, then we are ready to start searching for available lakes. For this tutorial, we will be using Lake Mead in Nevada, USA.
import lakepy as lkmead = lk.search(name='mead')
When you search using the name parameter, you are likely to come up with multiple results (but not always!). If there is more than one matching lake, you will see a table printed with the records returned (Note: you can specify markdown = True) in the search function to return markdown formatted tables).
id_No source lake_name
0 114 hydroweb Mead
1 1519 usgs MEAD LAKE WEST BAY NEAR WILLARD, WI
Two records were returned, meaning the function requires greater refinement of search parameters in order to choose a lake to download. We can do this one of two ways:
- Specify a source. In this case, we would pass source= “hydroweb” as the other result (Mead Lake) is seemingly in Wisconsin!
- Specify the GLLD ID number (id_No). This is the preferred method of access! Many lakes are named similarly, meaning the highest level of specificity one can provide is ideal when querying the GLLD.
We can now specify the id_No for the desired lake (114)
mead = lk.search(id_No = 114)
If successful, lake metadata will be printed and the variable “mead” will be created.
To confirm the process worked correctly, we can print the data type of variable “mead”. We can see it is indeed a LakePy Lake object.
The variable “mead” is now an object of the Lake class. This just means all the necessary data and functions will stem from this object (our variable, “mead”). If you are familiar with Pandas, DataFrames work in the same fashion.
The associated attributes of an object of class Lake are:
- continent (currently not supported for HydroWeb)
- latitude (currently not supported for G-REALM)
- longitude (currently not supported for G-REALM)
We can use these to gain more information about Lake Mead, like the range of accessible data.
Out: 2000-06-14 10:22 -- 2014-12-29 00:21
the “metadata” attribute will return all associated lake metadata as a Pandas DataFrame, which was also returned earlier.
The most important attribute is, of course, “data”. This attribute is a Pandas DataFrame with four columns: id_No, date, lake_name, and water_level.
The fact that mead.data returns a Pandas DataFrame lets the end-user have immediate access to familiar methods and attributes. Simple statistics like generating the time-series median is easy and straightforward (In this example, I converted the data to a Pandas Time Series for ease-of-use).
ts = mead.data.filter(['date', 'water_level']).set_index('date')
With a little Pandas know-how we can even compute rolling means.
There are two native plotting methods for Lake objects. The first is plot_timeseries(). This method can render a time-series plot using Plotly (default), Seaborn, or Matplotlib. The axis instance can be returned for more customization by setting the parameter show to False.
The second built-in method for plotting is plot_mapview(). This method utilizes GeoPandas and Contextily to plot a general overview map.
Additional Contextily providers and zoom levels can be provided.
import contextily as ctx
mead.plot_mapview(provider = ctx.providers.Esri.WorldImagery, zoom = 10)
And that’s it! For more details, please visit our documentation. If you would like to contribute, visit our contributing guidelines.
This work is based on funding provided by the ESIP Lab with support from the National Aeronautics and Space Administration (NASA), National Oceanic and Atmospheric Administration (NOAA) and the United States Geologic Survey (USGS). Additional thanks to the University of Texas at Austin. UPDATE: LakePy received additional, generous support in 2021 from Derek Masaki and Farial Shahnaz. Many thanks to them!