Water is the most precious resource on earth; all living organisms depend on water to live, and it forms 2/3 of our planet. Despite its importance, there is a shortage of fresh water in most of the world’s urban cities. Hence, conserving water is a strategic choice for almost all humans.
To put water conservation plan, we must know the amount of water consumption in each sector (industry, agriculture, domestic, …). In this study, we have analyzed a dataset of a sample city, that is found on the Kaggle website. The city is Sonora, Mexico which is a medium-size city.
In this study, we would like to answer the following questions
- Does the water consumption affected by the month of the year, and does it changes from year to year?
- Which of the following factors affect the water consumption more: Industry; Agriculture; or Homes?
- Which sector should be given the greatest attention؟
- Can we use machine learning techniques to predict monthly water consumption?
I have explored the data, downloaded the English description of the features (the original dataset is in Spanish), and explored the dataset properties through python language and its scientific libraries. I found many missing data points, some of them were dropped, and some of them were filled using the method described in this post.
Filling gaps of a time-series using python.
A comparative study to see the easier and most precise method to impute a time-series.
The analysis took four stages, which are explained in this GitHub repository.
The dataset contains two main variables, the average monthly water consumption from 2009 to 2015, and a reference water consumption of 2016.
The effect of the temporal factors
The year effect is illustrated as follows:
- The average monthly consumption was dropped from 19.5 units in 2009 to about 16.5 units in 2012, then increased gradually to about 17.25 in 2015.
- The reference consumption values are almost fixed over the years (the difference between the minimum and maximum values is about 0 units!)
The month effect:
- The average monthly consumption is on its peak (18.5 units) on July and August, and on its minimum (about 16 units) on March and December.
- The reference consumption values are almost fixed over the months as well.
Now, let’s see the month and year combined together.
- It appears that the average monthly consumption follows a time-series shape with up and down fluctuations. The minimum value was on Q1 of 2013 (about 15 units), and the maximum value was about 21 units in Q3 of 2009.
- The consumption seems to follow the temperature trend, as it is minimum in winter months, and maximum in summer, also some years are warmer than others, which affect water consumption on annual basis.
The effect of the classification factors
1- The effect of industry
It appears that the existence of this category increases both the reference consumption (RC) and the Average Monthly Consumption (AMC), however, the increase is not so significant (from 17 units to 25 units AMC, and from 1100 to 1750 units of RC. Notice that most of the cases are non-industrial (10.4M records) vs. 1.3M records of industrial records; this means that the industry is not a big player in water consumption.
2- The effect of agriculture
Although the agricultural usage consumes 800% more than other categories, it is not so significant consumer according to its few records (0.07% of the records). This shows that Sonora is not an agriculture city, as agricultural usages are limited to parks and gardens. However, the high water consumption of agricultural consumers reflects the importance of applying water conservation practices of irrigation in the cities that agriculture plays a significant role in its structure.
3- The effect of housing consumers
Unlike the industry category, the housing records are significantly less than the non-housing records. Additionally, the housing category represents most of the records in our dataset (11.6M vs 32K, 99.73%). Nonhousing records include, for example, industrial, commercial, and agricultural records. Still, although the housing consumes less water (about 7.7%), it is important to encourage water consumption policies for this category due to its dominance as 1% reduction in the housing consumption is equivalent to 30.4% reduction in other categories consumption.
The housing category includes domestic properties (only one family per property), social properties (clubs, etc), and residential properties (care home, nursing homes, blocks of flats, home of multiple occupancies, …). This dataset offers the housing category in bulk, and the domestic residential, and social categories separated; the data show that the domestic properties are 12x the social properties, and 9x the residential properties. It is noticed that the significant difference in consumption appears in the domestic and residential properties, but not the social properties.
The most important consumer to take care of
As highlighted above, the housing category is the most important category to give attention, despite it consumes little amount, it compensates the most water consumption, thus if we succeeded to reduce 1% of houses consumption, it will be better than reducing 30% of industrial consumption.
Note: this result is associated with the current data, other cities should differ depending on the houses total consumption in comparison with agricultural or industrial consumptions.
Can we use machine learning to predict water consumption for this city?
Water Consumption in a Median Size City, Part of Udacity Data Scientist Nano Degree (Term 2). - drnesr/WaterConsumption
As detailed in the GitHub repository, I have tried three different models with two sets of features, the only reliable result was achieved by applying Linear Regression on the full set of features, this gives a 71.6% of correct predictions of the average monthly consumption, however, the reference consumption could not be predicted reliably by the given set of features.
Water conservation is a vital process, people will not survive if the present water consumption continues. it is highly recommended to perform similar studies on each city in the world, to be able to put a plan to reduce water consumption; if we succeeded, the limited fresh water in our plant will be available for us and for our children.
For more information: