Deaths by Railway Accidents in India

Vineet Kamboj
7 min readSep 11, 2017

--

An exploratory data visualization exercise

All of us have traveled in trains or at least been to a railway station in our lifetime. Indian railways, started back in April 1853 is today the largest railway network in Asia and forth largest network in the world operated under the single management.

In India, the railway tracks are spanned across 19,630 kilometers on which approx. 13,313 passenger trains run daily carrying around 22 million passengers in a day.

Trains are perceived as one of the most cheapest and the safest mode of travel, though some times train journeys prove to be fatal as well. There had be a lot of railway accidents in the past and these accidents occur due to different reasons. Some of the most common accidents are Derailment, collisions, train fires, signalling errors, failure of railway staff etc. These accidents not only costs a big monetary loss about also major human causalities.

Railway Accident {source: https://cdn.yourstory.com}

This semester we had a course on Interactive Data Visualization. This two week course brought us 3 major assignments on data viz. The third assignment was about searching for a relevant data set, parsing the required data from it and visualizing it. The domain given to us to explore was “Indian Railways”.

FINDING A RELEVANT DATA SET

Searching through the available data sets on www.data.gov.in related to railways, I came across a lot of interesting data sets. The one we had earlier was the time table of all the trains and the stations they cover, but I was looking for a data set with which I could do a exploratory visualization.So, I picked up the data of State wise number of deaths in railways accidents as well as in railway crossings (2014). Both the sets of data were different as one was had the details of Railway accidents where as the the other one had details of deaths by railway crossing. Railway crossing data was combined data set for deaths from manned and unmanned level crossings.

The second data set I sourced from “www.indiastat.com” which game me the state wise number of manned and unmanned level crossings. In order to find a co-relation between the number of deaths in railway accidents in a particular state and the railway density of that state, I searched for the data which would give me the number of railway stations in each state. The first set of data I got was a .geojson format.

PARSING THE DATA

After collecting all the data I started to filter out the required data points and dumping it into a single .csv file. The death data was available with profile details. I had the number of males, females and transgenders died in every state. The data of number of railway stations in each state was available in a .geojson format, so I had to extract the number of stations in each state. I first plotted the data on a map using mapbox.gl to see the density of railway network in different state.

First I tried a simple hack to get the no. of stations in a state by just searching the name of the state as the attributes of the station contains the name of the state. Later I found that almost 2000 station had a null value, so the data I was using was highly incorrect. Then I had to look for the same data set again. Finally I found it on “www.searchmytrain.com”.

FINAL .CSV FILE

After all the checks I finally made a consolidated file which had 12 Columns and 33 Rows. Now with all the numbers in front of me I could see a pattern if combining all these data points would give something extra-ordinary.

Now it was the time to think about the visualization. What should I visualize and how should I do it. I wanted to visualize the relation between the density of railway network and no. of deaths due to railway accidents in a state. On the top of it one should be able to compare all the given states.

Since the data was to be compared among the 29 states (includes UT’s and excludes states and UT’s with not rail network or negligible number), bar graph seemed to be good option. But I wanted to should 3 parameters at once.

  1. Number of deaths in a state
  2. Number of railway stations in a state
  3. Name of the state.

Generally Graph charts take in 1 variable and 1 static data. Preferably, the static data is given X-Axis and the variable data is given the Y-Axis. The data I was trying to fit into the bar graph were both variables and the aim was to find a relation between the two, so not prioritizing any one among them.

Since I was making this visualization on Sketch, I had to convert my values into Pixels(px). So I did all the calculations on a separate workbook.

VISUALIZATION [1]

DataSet:
Number of deaths due to railway accidents
Number of railway stations

This bar graph contains information on both the axis. The width of the bar shows the number of railway stations in the state in comparison to other states. The height of the bar shows the number of deaths due to railway accidents.

This is a similar screen but the Axis is swapped. Now the width of the bar shows number of deaths where as the height is now showing number of railway stations in each state.

But why this swap? Do we really need it?

This I would say was a easy way which I took to resolve (which I don’t think is resolved though) the conflict of some people arguing to have the number of deaths in Y-Axis as the visualization aims at visualizing the number of deaths. But, my motive was to bring comparisons and relationships between the data sets.

DataSet:
Number of deaths due to railway crossings
Number of manned crossings
Number of unmanned crossings

This screen combines 3 data sets together. The width of the bar shows the no. of deaths. The green bar shows the number of manned crossings where as purple shows number of unmanned crossings.

Recently there was a lot of news about increasing deaths due to unmanned crossings, where government is planning to remove all the unmanned crossings for safety. But there the data shows the in spite of having among the lowest unmanned crossing and also total crossings the deaths in Telangana is way higher than other states.

Check the Invision Prototype
Click Here

(Work in Progress)

VISUALIZATION [2]

Since I was dealing with a multidimensional data set, I thought of doing an iteration of a 3d visualization. Being from a non coding background I was stuck and was looking for a way by which I can quickly iterate a 3d viz. I thought why not do it on Rhino3D and Grasshopper. ;)

I wasn’t sure if I ll be able to do it, but I gave it a try.

It took a while to do this, but I was happy for what I did :)

The Y-Axis shows the number of railway stations, X-Axis has the Name of the states and the Z-Axis shows the number of deaths.

The colour legend and annotations are missing which I would update later.

I haven't complete it yet, so it is a work in progress. I would complete it and would update the blog as soon as possible.

Click on the play button to view the visualization.
(View it in full screen for better visibility)

You can download all the data files used in the project from the link below.
Download Files

I hope you might have enjoyed. Please leave some constructive feedback.

--

--