Visualizing Hurricane Maria’s impacts in Dominica for the #VizRisk challenge
The Understanding Risk (UR) community and the Global Facility for Disaster Reduction and Recovery, in partnership with Mapbox and the Data Visualization Society, launched the VizRisk Challenge that ran from May to July 2019. Our team spends a lot of time analyzing impacts from natural hazards, but always from an engineering perspective. We were excited about spending some time thinking about the more creative aspects of data visualization and web development in this space, so we dedicated a few days of our spare time over the course of the competition to see what we could pull together! This post summarizes our team’s submission.
Rather than explain what our submission looks like, head over to http://vizrisk-dominica and see it for yourself! In case you need some convincing, here’s a preview…
Now that you’ve seen our work, let us tell you a bit about how we pulled it together, the tools and the data we used.
Choosing hurricanes, Dominica and developing our story
Our California-based team spends a lot of time thinking about earthquakes, so we wanted a chance to branch out and look at a different type of hazard, hence choosing hurricanes. The provided datasets for the Caribbean were a great starting point and resonated with our team since some of us have lived and worked in the region before. From there, we explored these datasets as well as others we could find online to see what insights we could draw from them. Nicole and Tamika spent some time going through the data and creating various exploratory plots while Ji Su and Karen did some background research on Dominica and Hurricane Maria. Angela took the lead on importing datasets and styling in Mapbox, and also worked with Tim to spin up the web app. The datasets we ended up using include:
- Hurricane Maria storm track and wind speeds were developed by Tropical Storm Risk and UCL and downloaded from the Humanitarian Data Exchange. This dataset was a shapefile with polygons indicating areas that experienced different peak wind speeds.
- Hurricane Maria building damage assessment (dataset no longer available on UNITAR website) this dataset contained point data for 28,309 buildings, including the level of damage sustained from the hurricane based on field observations. Of these buildings, 7,841 records indicated a more refined level of damage (e.g. negligible to slight damage, completely destroyed).
- Addresses and occupancy data from OpenStreetMap. We were able to use a Python package called Geocoders to get an address (e.g. street, suburb, parish) that corresponded to each lon-lat in the damaged buildings dataset. We could then use the address to query any additional nearby features (e.g. building occupancy, road type).
- A dataset containing information on displaced populations was produced by the Displacement Tracking Matrix. The data included information about the number of people displaced at the parish level, including their origin and location over the time period of October 2017 to January 2018 and was provided in a spreadsheet.
- A map showing windstorm susceptibility was produced in 2006 as part of a USAID funded project. This was available as a shapefile with polygons showing different wind hazard regions.
- The Post Disaster Needs Assessment (PDNA), as well as news articles providing coverage about Dominica’s resilient strategy for recovery, provided a lot of the background information about Hurricane Maria, the impacts and what Dominica is doing to recover.
Somewhat surprisingly we were able to find a lot of publicly available data for Dominica, however there were some gaps in these datasets. Using OpenStreetMap allowed us to fill some of those gaps and reinforced the importance of contributing to open source, collaborative efforts like this. Our firm regularly hosts mapping sessions to contribute to Missing Maps projects, in which we collaborate with other volunteers around the globe to trace satellite images in OpenStreetMap in areas inhabited by vulnerable populations. Experiencing the challenge of working with limited datasets first-hand in the VizRisk Challenge further reinforced our team’s existing commitment to contribute to the open data movement.
Iterating to create the final product
We were simultaneously overwhelmed and excited about the amount of publicly available data we were able to find, even if there were significant gaps between the datasets. The challenge was finding a story to narrate through the data to tell a compelling and impactful story about Hurricane Maria and combining these datasets across their different resolutions.
As most of our primary expertise is in natural hazard risk and resilience, our thoughts naturally shifted towards climate change, loss assessment, and resilient recovery. We ideated around the following questions:
- [Hazard] Was there anything significant about the intensity of this storm that distinguishes it from past events? With climate change, what can we expect in future hurricane seasons?
- [Exposure/Risk] What were the economic and social impacts of Hurricane Maria?
- [Vulnerability] Was there a lot of structural damage? What implications does this have for building codes, and how can we create predictive models of damage?
- [Resilience] How can Dominica incorporate resilience into its recovery and mitigation strategy?
We then began to answer these questions using our datasets, connecting the dots and thinking through useful data analyses and visualizations we could produce. We tried to focus on visualizations that could be useful in recovery planning, such as building code modifications, funding prioritization, and mitigation strategies.
While developing a story was a highly nonlinear task, we eventually decided to step through the visuals in a linear format in typical risk analysis fashion. Beginning with the hazard, we showed the storm track and wind speeds for this event, combined with the damage assessment to create a fragility curve as a function of wind speeds. Then we combined the exposure data to emphasize the significant losses in the housing sector. We then created map layers to indicate the site-specific wind hazard for use in future structural designs, as well as the gaps in current infrastructure to indicate potential areas of future investment.
The competition required us to use Mapbox as the base layer of our map and since no-one on our team had used this before, there was a bit of an initial learning curve. Attending the SF event at Mapbox on June 4th, 2019 and getting some in-person demonstrations, along with support from the Data Visualization Society on Slack, helped a lot. Some other tools we found useful included:
- Data analysis: Python, R, Julia, Excel, MongoDB. Using various scripting tools allowed us to manipulate and combine publicly available data and aggregate for plots in Highcharts and/or geospatial data to upload to Mapbox. MongoDB was useful to quickly process data, but also allow a flexible data structure before we knew how we wanted to combine and store our processed data.
- Data manipulation and visualization: Mapbox, ArcMap, QGIS, FME, and Jupyter Notebook were used for some quick visualization and manipulation of spatial datasets.
- Web development: Angular 7, Highcharts, Mapbox GS API, AWS.
- Collaborative tools: Google’s suite of products, Slack and Bitbucket were invaluable for coordinating across the team and different locations.
The team spent a lot of time coding up different things to do data analysis, visualization and ultimately create the final web product. The majority of our data-processing was done using Python scripts in conjunction with MongoDB. Here are a couple of snippets you can use to replicate different tasks….
Attaining addresses from lon-lat pairs using Geocoders OSM API in Python
The building damage assessment data was extensive, but we wanted to aggregate the data in a meaningful way to show overall trends. In order to do this, we used a Geocoders package in Python, which enables the user to connect to a variety of different mapping APIs. We decided to go with Nominatim (an OSM API), which restricted queries to 1 per second. For 28,309 buildings this would mean we’d need nearly 8 hours to process all the addresses! Since only a subset of the data had the resolution we desired (i.e. distinguished between different levels of damage), we reduced to that set of 7,841 buildings, which completed in roughly 2 hours.
Below is a code snippet showing how we got useful address data from OSM. We later had to parse this data so we could easily aggregate across different parameters.
Parsing addresses from direct OSM output in Python
The output from the above snippet was useful, but required parsing to be meaningful. Some results would look like “Upper Pennville, Penville, Saint Andrew Parish, Dominica”, while other results would look like “Bay Oil Distillery, French Island, La Plaine, Saint Patrick Parish, Dominica” or “Canefield, Cochrane, Saint Paul Parish, 00109–800, Dominica”. The below snippet handles these different cases in a manner that was custom to how Dominica divides into administrative boundaries.
A particular focus was spent on the capital city of Roseau, and the suburbs within the city as the building damage data lent itself to being grouped at that scale. There was no attempt to check parsing at the street or place level.
Aggregating data across different parameters in Python
With the parsed addresses, we were able to aggregate damage by suburb within Roseau. The code snippet below shows how we did that for the damaged buildings considered.
Comparing results to predictive models of vulnerability from literature in Julia
In our daily work, we use “fragility curves” to predict how buildings or other assets might be damaged in a natural hazard event. At the country-scale, fragility curves tend to represent how various building archetypes might get damaged at a given level of demand (e.g. wind speed). Many studies use HAZUS to perform such risk assessments, but the fragility curves provided therein are based off data from the United States. Since construction practices tend to differ between the United States and other countries, it is often not suitable to direct application in other contexts.
Given that limitation, we searched Google Scholar for any available literature that proposed wind fragility curves for the Caribbean. We came across Gonzalez 2007, which provided wind fragilities for Puerto Rico. Although this wasn’t ideal for our purpose (i.e. comparing to empirical results in Dominica), we assumed it would be more realistic than using US-based fragilities from HAZUS. Based on Google Image searches of Dominica construction, we thought it would be most reasonable to compare the Gonzalez 2007 fragilities for low-rise wood frame construction to our empirical dataset.
Here’s how the Gonzalez 2007 fragility curves for Puerto Rico (low-rise wood frame) compared to the Dominica building damage data in Hurricane Maria. Note that since the max wind speeds were significant throughout the extent considered in this investigation, we are unable to get a meaningful comparison at lower damage extents (e.g. slight).
Reflecting on our work
Our objectives heading into this challenge were the following:
- Leverage our data analysis skills in natural hazard risk/resilience
- Learn to use Mapbox and enhance our data visualization skills
- Learn about Hurricane Maria and its impact on Dominica, as well as how the nation is investing in a more resilient future
- Communicate our findings with a multidisciplinary audience
Overall, we feel as though we’ve more or less achieved these goals. We’re proud of the visualization that we were able to create in the time we could dedicate to this and the skills that we have gained by participating in this challenge. In particular, Mapbox Studio allowed various members of the team to create and edit various layers and contribute to data visualization without requiring significant web development skills.
If we had more time…
We would spend more time to learning how to best use Mapbox and its various tools. Mapbox Studio offers a lot of different capabilities, and we feel that we could’ve explore a wider range of visualization options if we had more time. We could also spend more time tightening the story up, cleaning up our code, optimizing the site for mobile devices… the list is endless.
We also want to say a big thank you to Understanding Risk, GFDRR, Mapbox and the Data Visualization Society for putting on this competition and giving us a chance to play with some data, learn new skills, and create something tangible to help us better understand the impact of natural hazards in a visual way. We had a lot of fun in the process!
Nicole Paul is a risk and resilience consultant based in San Francisco. She works at the intersection of natural hazard risk, probabilistic methods, and digital expertise.
Tim Arioto is a Fullstack Software Engineer based in San Francisco. He creates data visualizations of risk assessments of natural hazards to enable understanding of large, complex datasets.
Angela Wilson is a GIS Analyst and Frontend Software Engineer based in Los Angeles. Her background is in GIS and data visualization, but her role currently includes more frontend web development.
Tamika Bassman is a structural analyst intern with our SF Advanced Technology + Research team. This is her first foray into data analysis, mapping and visualization.
Ji Su Lee is a risk analyst based in San Francisco. Her work focuses on the combination of web development and seismic/flood risk assessments.
Karen Barns is a risk and resilience engineer based in San Francisco. Her work focuses on probabilistic risk modeling for disaster risk management and is slowly learning about web development from our rockstar developers!