How moving from excel to modern data science tools can streamline California water usage reporting

Patrick Atwater
A.R.G.O.
Published in
3 min readNov 25, 2015

Since the summer of 2014, the State of California has required its major urban water retailers to submit monthly total and residential water usage requirements.

Currently utility staff must manually enter in each months data into a web form that is then used to populate a public facing excel spreadsheet.

This process is suboptimal, not least because it requires substantial staff time and overlaps with Urban Water Management Plan required tables (namely total production and gallons per capita per day).

A new CA data collaborative is currently working to automate the ingestion of the raw underlying customer-level water usage data into a secure centralized database. Below I show how to programmatically calculate this currently reported information using python’s pandas package.

CA_Water_Usage_Data[CA_Water_Usage_Data['Customer_Type'] == 'Residential'][CA_Water_Usage_Data['Date'] == 'Month in Question'].groupby('Water_Retailer').aggregate(sum)

That will give a list of the total residential water usage for the specified month for every utility in the database. Similar commands are possible using postgresql or SQL or whatever your database technology of choice is. From there it’s a simple matter to divide by population using a table of population by month for each utility to calculate Residential Gallons per Capita per Day for each utility. Such a programmatic approach has the virtue of simply requiring the “Month in Question” parameter to be changed and a button to be pushed rather than requiring 411 humans, one at each of the major retailers currently reporting to the state, to manually submit data.

Proper integration of this data with other state reporting requirements could enable pre-populated tables that rely on the same underlying information. For instance, Urban Water Management Plans require total production, retail demands by use type (Table 4–1) and gallons per capita per day, all of which could be generated from the customer-level usage database outlined above. Other portions of the report require judgement and additional analysis such as projected population (Table 3–1) or projected future water demand and could not be simply automated using programmatic report generation as outlined above.

Such a database could also be used to investigate how customers comply with existing state efficiency standards, namely 55 gallons per capita per day indoor and 80% times evapotranspiration times the square feet of outdoor irrigable area. Many of the utilities involved in the CA Data Collaborative with already have irrigable area per customer parcel as part of their water budget rates and a host of remote sensing companies can generate that data for other utilities using aerial flyovers. Developing customer usage along with household size, irrigable area and evapotranspiration would enable California’s water management community to more accurately assess “reasonable use” and respect article X of the California constitution.

For regular Californians that pay water bills, this means that we could move beyond a world where some communities require every customer to cut back the same amount regardless of their actual water efficiency or some customers are allowed only to water 2 days a week but friends not too far away are allowed more or less or high water using customers pay widely diverging amounts for the same volume of water.

With statewide customer-level measurement of water efficiency we could provide clear, consistent standards of reasonable water usage that apply to all Californians.

--

--