This post describes the process of collecting, analysing and visualizing data for testing whether poor dam management had a major role to play in Kerala Floods or not. All this was done as a part of Data Vizualization course at IDC.
The end result was a simulation where one can control the dam activity to see the effects on flooding.
The whole concept was to be able to establish a relationship model between the amount of water released and the resulting flooding and then use this model to create a simulation for enabling the users to tryout different dam configurations to access changes in flooding conditions.
The whole process was majorly divided into 3 steps:
- Data Collection
- Cleaning the data
- Selecting/Creating a relationship model
- Data Encoding
- Creating the Visualization
After a little digging, I found daily system statistics on KSEB’s website
This data was available day-wise so i had to manually enter each date and then copy the contents in an excel sheet. This is the point where I decided the range of dates i would be working on, from 15th-July to 5th-Sept-2018.
For rain data i first turned towards Indian Meteorological Department. They have all their data in forms of rasterised images. And most of the data is either cumulative or averaged over zones or time.
The United States National Climatic Data Center has elaborate rainfall data but it’s from only one station in Kerala. Hence not sufficient.
So, after a little struggle i finally found this Russian website Reliable Prognosis which has very elaborate data from multiple stations in Kerala.
Their data contained humidity values, precipitation values, wind directions and a lot more, recorded two times a day. They have a simple interface where you have to select a specific station and then specify the range of dates for which you want the data. It is made available in csv format.
Understanding The Data
The rainfall data was pretty straight forward with precipitation values available for each day in mm. While the reservoir data had some new terminologies and units which required some reading. Some of the terminologies are explained below:
- MDDL(Minimum Draw Down Level):This is the minimum level of water that is to be maintain din a reservoir at all times to meet it’s production requirements. (unit: in metre)
- Storage: There are two types of storage values. The first one being the volume of water stored in the reservoir. This value is represented in ‘mcm’ (Million Cubic Metre). While the other type of storage value is in terms of power that can be generated from the available volume of water. This is available in ‘mu’ (Mega Units).
- Spill: The amount of water released from floodgates in a day. This value is available in mcm/day.
- Inflow: The total amount of water flowing into the reservoir by all means(rains or parent river/ reservoir). This surprisingly was available in ‘mu’ only.
The data contained the total level, inflow and spillage values for 16 reservoirs and not the dams on these reservoir. Though separate data about the number of gates and spillage capacity of each gate was available for all the dams, there was no possible way to find out the exact details of when these gates were opened? or How many were opened? and to what height? Some newspaper articles have some data about the opening of gates but it is not reliable and doesn’t contain enough details. So i decided to concentrate on just 1 reservoir, the Idukki reservoir because there was sufficient data available on this reservoir and It is connected to Periyar river which was a major carrier of the flood.
After collecting the data, next step was to come up with a model to establish a relationship between spillage and flooding. I read through few papers like
“Probabilistic Modeling of Floodwater Level for Dam Reservoirs” -Claudio Caravajal; Laurent Peyras; Patrick Arnaud; Daniel Bossier; and Paul Royet.
“A probabilistic . model to ssupport reservoir operation decisions during flash floods” -L.Mediero; L.Garrote and F. Martin-Carrasco.
and a few more similar papers. Though the method and the theories were relevant, it was difficult to apply since there was not enough data about factors like topography, soil moisture content, gate opening routine etc. So i decided to create a simple model involving the data that i had and a few assumptions.
Lets imagine a huge circle of area equivalent to that of Idukki district and the Idukki reservoir to be in the centre.
We don’t have the topography data so we assume that the water coming out from the spillways is uniformly getting distributed in some portion of the available area. Now, considering the whole area of the Idukki district would be inappropriate because it is too large and more importantly water will only flow through the areas with lower topographical level. Now getting the exact number or equation for that area is difficult, considering the time frame.So i decided to keep it as an arbitrary constant that is divided by the area of the district to get a more value for the affected area. This factor is called the Area Factor and it can have values between 1–50. 30 means 1/30th of the area of Idukki district.
Now once the water is out of the dam, there are three possibilities as to what can happen to it.
It can either flow towards the ocean, or can get seeped inside earth or it can get evaporated. Now, again to consider these factors we need topography data, soil data and some data to calculate the evaporation rate, which we obviously do not have. So to simplify it, let’s consider another arbitrary constant and call it Drainout level(in mm). It refers to the amount of water that would drainout from a unit area of land in a day.
Now the third and the final constant is the Max. water release. This refers to the maximum amount of water that can be safely released through the reservoir in a day. Now since we do not have the data about operations on flood gates so we simply use the existing spillage values and consider the highest value to be safe bar and hence keep our values within that range.
To sum it up we are using three arbitrary constants whose values can be manipulated by the users to test different conditions. The constants are:
- Area Factor: no units
- Drainout Level: measured in mm (milimeter).
- Max. Water Release: measured in mcm(million cubic metre)
These values can be set using the form provided in the top right corner of the page.
The final visualization is in the form of a horizontally scrollable grouped bar chart with the yellow bars showing the water levels in the reservoir and the stacked bars in front representing the total amount of water accumulated in a unit area in a day. The x-axis shows the range of dates from 15th July to 5th Sep, 2018(each bar corresponds to a date). There are two y-axis, one for the yellow bars and one for the water levels.
The water level is further subdivided into two colors of stacked bars. The blue bars represent the water accumulated trough rainfall, while the orange bars represent the water through dam spillage.
A slight gradient has been added to the dam level bars to make them vanish towards bottom to increase readability and also because only the top parts of the bar were relevant.
The bars have been grouped to help user establish a direct relationship between dam levels and water levels on ground.
The number written on top of each yellow bar represents the water level in dam on that particular day. While the light gray values on top are the spillage values in mcm.
Creating the visualization
The data was compiled in the form of a csv and then parsed using d3.js(version 4.1.0). For other interactions i used jQuery(3.3.1).
First of all i started by creating the whole bar chart inside an SVG canvas, but being a beginner with SVG, i found it difficult to add interactivity to it so i decided to use html for the chart. I replaced all the <rect> elements of SVG with <div> which made my job a lot easier.
I had to provide different scales for water level bars and dam level bars because both of them have different levels of units.
The green slider towards the top of the yellow bars helps the user to release water from dam. It can be pulled down to increase the spillage value for the day. Maximum spillage per day can go upto the specified Max. Water Release level.
Now to provide a clearer picture of whats happening in the vizualization, i added a scaled sectional graphic of the dam to show the actual water level inside the dam and on the ground. The data for a particular day can be viewed by simply hovering over that particular bar.
Since the water level on the ground is almost invisible due to the stark difference between the dimensions of the elements on ground and the dam, I added a zoomed view on the bottom right corner of the web page.