VizRisk Challenge: An Exploration of Landslide Risk and Education in Nepal

Jenna Epstein
8 min readJul 16, 2019

--

I have been interested in geospatial data and visualization for a while, but I am a novice in risk assessment and analysis. I saw the Visualize Risk 2019 (#VizRisk) challenge as an opportunity to learn and experiment experiment with different datasets and visualization tools in a structured way.

Getting Started

I started by doing some background reading on the landslides in Nepal. A sub-topic that quickly stuck out to me was the impact that aftermaths of natural disasters have on access to primary and secondary education for youth across the country. When landslides occur, they often damage major highways that are the primary routes for children to take to get to school. When these routes are obstructed, children either have to take long detours and risk being late to school (and risk their own safety), or they miss school altogether.

Some of my background reading:

Digging for Initial Data and Data Preparation

I started by finding a GeoJSON file of Nepal’s district boundaries by Open Knowledge Nepal. Then, I decided to explore what types of open data were available around education in Nepal. On Open Data Nepal, the dataset from the Ministry of Education with the number of schools and students in 2074 by local level was of particular interest. If I had more time to devote to cleaning and parsing more datasets, I would have gone about and pursued this at the local level. However, due to the myriad of English spellings of districts across datasets, I found it easier to just consolidate school data at the district level using pivot tables in Excel. Otherwise, I worried that I would spend too much time matching up and standardizing English spellings of districts on this dataset with the English spelling of districts in the GeoJSON file.

I then found district-level data on the number of landslides and heavy rainfall instances across the country from January 2010 — June 2019. I created exports for this timeframe of data on Nepal’s Disaster Risk Reduction Portal (http://drrportal.gov.np/). I then joined this data with the district-level data on schools already in my .csv.

Merging district-level datasets around forest land, landslides, heavy rainfall incidents, literacy rate, schools, and students.

In my background readings, I also learned that deforestation is also a factor contributing to the prevalence of landslides. I explored some more on Open Data Nepal and found a .csv of data on forest land and total land, by district. I calculated forest land as a percentage of total land by district, and joined that data with the other district-level data.

Preparing the Initial Data

I opened the GeoJSON file QGIS, and then brought in the cleaned version of the .csv file with the total number of schools and total number of students in each district. I also included the breakdown of boys and girls in case I decide to include it later on. I created a join between the spatial file and the .csv, using the common field of “District.” For any values that appeared as “NULL” I quickly realized it was due to some English spelling differences. I edited the names in QGIS to match the .csv, and that resolved the issue.

Merging district-level spatial data with school data in QGIS.
Visualizing Nepal’s districts in QGIS.
Visualizing Nepal’s districts in QGIS.

Risk Data for the Basemap

I then turned to explore some data to visualize hazard and vulnerability. The GeoNode LS_GLOBAL-GAR13 dataset provides estimates of hazard (classified from very low to high) in multiple formats ready for use. Here is the information about this dataset:

“This dataset combines two GAR13 rasters to produce an estimate of the annual frequency of landslides triggered by precipitation and earthquakes. The original datasets depend on the combination of trigger and susceptibility defined by six parameters: slope factor, lithological (or geological) conditions, soil moisture condition, vegetation cover, precipitation and seismic conditions. Preprocessing has been undertaken to classify these estimates into a Hazard Classification (1= Very Low, 2= Low, 3= Medium, 4=High). This product was designed by International Centre for Geohazards /NGI for the Global Assessment Report on Risk Reduction (GAR). It was modeled using global data. Credit: GIS processing International Centre for Geohazards /NGI. Preprocessing for ThinkHazard! conducted by GFDRR.”

Styling a Basemap in Mapbox Studio

I decided the best way to include this visualization of hazard risk was to apply it to a background map. Using Mapbox Studio, I customized the “light” map style to include this layer. I styled the data with a lighter color scheme (manually choosing RBG hues on the gradient scale scale) and reduced the opacity. This way, other data that I choose to represent as an overlay will not be visually lost. I also darkened up the hillshade so that it would show through the risk layer a bit more.

Nepal-landslide-risk basemap, styled in Mapbox Studio
Nepal-landslide-risk basemap, styled in Mapbox Studio

Taking it to Tableau

For displaying and presenting the information, I settled on using a dashboard view in Tableau Public. I also enjoy doing custom code work with Mapbox GL JS, but given my own time constraints — and my existing familiarity with the Tableau public interface — I decided it would be the best fit.

After adding in my main data source (a GeoJSON with geometry of Nepal districts and school data that I joined together in QGIS), I added my Nepal-landslide-risk Mapbox basemap to my Tableau and created a worksheet. I added the geometry for the districts to the sheet and adjusted the opacity (down to very light) so that the basemap remains visible underneath, and made sure that the boundaries were darker.

It took me a little while to figure out the best way to present as much information as would be relevant to the story, but not to visually overload the viewer with too much geographic data. I decided to keep the actual map itself pretty simple, and instead just added some details to the tooltips to appear when hovering over a district. I pulled in the number of schools and number of students to keep it targeted to the education information. I also added a slider so that a user can isolate districts based on the number of students attending schools. That way, it becomes easier to see where the more student-populated areas are, and then looking at the hazard layer behind it, can see which parts of the district (or all) are at a degree of landslide risk.

Using my custom Mapbox map to show landslide risk, with district boundaries (and school data provided in tooltips) as an overlay.

On a two other separate sheets, I wanted to take a look at districts and their percentages of forest land (out of total land). Instead of doing another map, I settled on a simpler table, but with an orange-yellow gradient (darker means higher percentage). This is just a supplementary piece of the story, so I didn’t want to draw focus away from the map itself when adding this to the dashboard. I also created another visualization to show the number of landslides relative to the percentage of forest land in each district. In the future, I would want to run some tests to actually see correlation exists (based on these specific datasets) and the strength of it (and confidence in it), but for now I just wanted to provide a visual snapshot of these variables in the context of each other. I also used a blue gradient for a third variable — number of instances of heavy rainfall — to provide more meaning behind each circular district data point marker.

I pivoted over to creating a dashboard so I could get started putting together a cohesive story. I pulled in the map as the main focus, and then included the other two visualizations to the right of the map. I decided to use the map itself as a filter so that when I would add the other data viz components to the dashboard, a particular district would come to focus across each part.

Tableau dashboard progress

Wrapping Up and Looking Forward

Link to Visualization on Tableau Public: https://public.tableau.com/profile/jenna.epstein#!/vizhome/LandslideRiskandEducationinNepal/NepalDistrict-LevelSchoolDataandLandslideRisk

After finalizing the arrangement of data viz components on my dashboard, I then added some commentary to provide context for users to understand the story I am starting to tell. I say “starting” to tell because I believe that this is just the beginning — just one piece — of the story on the impact of landslides and post-landslide-related incidents on access to education for Nepalese children. I believe more exploration and analysis is needed to dive deeper into the other factors contributing to landslide risk on the local level. If I were to continue working on this project, I would use the geospatial data I found initially (on Open Data Nepal) on administrative boundaries at the local level as my base geography. Then I would seek to find a dataset (or create one) identifying the locations of schools with LatLong coordinates to include either on my basemap, or as another geometric layer (using points) to interact with. In this way, one could explore clusters of schools — and perhaps even use data-driven styling to associate the size of a dot with the number of students in the school — and identify more localized areas where more students may be affected by future landslides.

Selecting a district to isolate on the finalized dashboard.
Selecting a district to isolate on the finalized dashboard.
Tooltips on the finalized dashboard.

--

--