Disaster Optional: Building climate resilience with data and tech to reduce the impact of floods

Published in
9 min readMay 9


As part of the Patrick J. McGovern Foundation’s Data Practice Accelerator partnership, Open Contracting Partnership and CivicDataLab prioritized using data to inform better flood control infrastructure and procurement decisions because of the ongoing threat flooding poses to local communities.
The original blog was published by The Patrick J. McGovern Foundation and can be read here.

by Bernadine Fernz and Kabeer Arora

Extreme weather events are arguably the most visible manifestation of the climate change crisis. The world has experienced a weather or climate-related disaster almost daily over the past 50 years, devastatingly affecting human lives. Daily losses amount to over USD 200 million, and natural disaster-related recovery and rebuilding costs are estimated to consume up to 15% of a country’s GDP.

Given the volume of this spending, public procurement plays a critical role in reducing disaster risk and increasing resilience. And given the impact on communities, this investment must be spent transparently and inclusively. However, information about what is spent, where, how much, and whether it benefits the communities most in need is sparse. Innovative solutions are needed to track recovery better and rebuilding efforts meaningfully, increasing the resilience of areas and communities most at risk.

Assam, a state in northeast India, is one of the world’s most flood-prone regions where annual floods acutely affect 40% of the state. The 2022 floods caused over 190 deaths and affected over 8 million people (as per data compiled from ASDMA’s daily FRIMS reports). Despite multiple initiatives by state bodies, the comprehensive, usable information needed to inform strategies to reduce the risk of disasters remains amiss.

Data that could enable more effective flood response and management is scattered or siloed across different agencies, at different scales and in different formats, making it difficult for decision-makers and relevant stakeholders to make data-informed decisions. This often results in inefficient processes and policies or ad-hoc responses that fail to adequately cater to urgent, often life-saving needs in times of emergency.

This is where our project, ‘Intelligent Data Ecosystem in Assam — Flood Response and Management (IDEA-FRM)’ came in. We wanted to make a functional data model that could ultimately be adopted by the Assam State Disaster Management Authority to better respond to and prepare for floods. To kick off, we set out to:

1. identify, collate and process all flood-related information onto a single point for all people to use and in a format that would enable processing using artificial intelligence and machine-learning tools; and

2. develop advanced data models to identify most vulnerable regions and their preparedness levels.

Following a comprehensive literature review, field study and stakeholder interviews we were able to identify relevant flood related data in five broad categories:

1. Satellite and weather data: This data helps us understand floods as a function of various natural factors like rainfall trends, distance to rivers, elevation, slope, drainage density, vegetation density, built density, soil and lithology;

2. Demographic data: This data helps us understand how floods interact with settlements and determine the impact on human lives and livelihoods looking at various social and economic factors;

3. Access to infrastructure: This helps us understand the vulnerability of regions as a function of infrastructure access to cope with floods;

4. Damages: This data helps us understand the trends of flood impacts in the regions historically; and

5. Government response: And finally we need this dataset to understand how the government has responded to floods and where the gaps might be. We use finance/spending data as seen through public procurement data.

Fig 1: Meeting with 30 village heads in Darrang District, Assam

This work was not easy. Apart from identifying the different data sources and standardizing the data, a major challenge in preparing the underlying data for our modeling was geocoding the procurement data which lacks a field indicating the location of work. We had to apply open-source text mining algorithms across multiple fields to identify the work location and then validate them.

Another challenge was to ensure the changing numbers of districts (administrative boundaries) that increased from 27 to 35 between 2011 and 2022 were reflected seamlessly across the years of study.

All the raw data from the five identified sources were cleaned, formatted, and transformed for ingestion into the analytics platform. This process includes consolidating or in certain cases, separating fields and columns, changing formats, assigning unique identifiers, deleting unnecessary data and making corrections to data. This data is now available to the public on Github — IDEA-FRM Repository.

Two models to assess flood risks and preparedness

With the data ready, we could start exploring two potential models.

The first approach, Flood Risk Assessment, integrates a machine learning model to predict the probability of flood occurrences and weigh it against vulnerability and access to infrastructure (coping capacity) to assess risk.

In the second approach, Flood Preparedness Model, we employ a statistical multivariate model to assess the preparedness to floods by combining all the datasets to identify the places which need to be better prepared to face floods.

In the first approach, Flood Risk is the perceived danger due to floods on population or infrastructure or a combination of both. In our model, the datasets on demography and access to infrastructure are weighted using the probability of being in a flood prone area (predicted using ten variables like Slope, Elevation, Distance from river, Drainage Density, Surface Runoff (GCN), etc.) to quantify their vulnerability to floods and overlaid with flood hazard to assess the overall flood risk.

Figure 2: Predictor and Response variables for the ML model
Figure 3 & 4: ML results for Morigaon & Kamrup districts of Assam

Even though this method gave us a relatively accurate prediction model, it did not use multiple variables like damages recorded, investments made and others that we identified playing a critical role in managing floods. It gives a prospective vision rather than synthesizing retrospective learnings from the past trends. Additionally, it does not establish the relationship between parameters which contribute to determining the risk severity.

The results from the first approach highlight that regions in Assam (administrative boundaries used in this case were revenue circles) with high flood impact are often those that have better access to infrastructure, leading to increased infrastructure damage from floods which is often cited as a cause for detrimental impact on recovery. Damages to infrastructure widens the impact of flooding beyond the immediate area to the surrounding regions.

This analysis helps inform decision-makers on where and how to channel infrastructure preparedness and response activities. The model can be further fine-tuned to improve understanding on what infrastructure to build and where, channeling the right kind of investments in most needed places.

The second approach however successfully identifies the relationship between the variables that determine the preparedness of the revenue circles to floods.

It uses Structural Equation Modeling (SEM), which is a multivariate, hypothesis-driven technique used to assess structural relationships.

In our model, there are five latent variables:

1. Flood proneness measures the hazard as a function of external factors like slope, land cover, soil type, rainfall, distance from river etc.

2. Demographic vulnerability measures the vulnerability of the population and uses specific demographic characteristics such as population distribution, household economics, age, sex ratio and so on.

3. Access to infrastructure determines ability of settlement to cope with disasters.

4. Flood impact is measured using the flood losses and damaged as recorded in the state authorities using Flood Reporting And Information Management System where they publish daily reports during floods.

5. Preparedness is defined as the function of financial variables from public procurement data by identifying specifically flood related tenders.

Figure 5: Schematic diagram for Structural Equation Model methodology

All the latent variables are unobservable — we cannot directly measure it. Under the SEM framework, we use observable data of tenders, flood damages, and so on to estimate these unobservable variables.

The model presents the regression estimates between each variable pair and also between the latent variables and the measured variables used to measure them. For instance, the ‘Preparedness’ is positively associated with the ‘Sum of Total Tender Value’ in a revenue circle, followed by the ‘count of tenders’ related to schemes (government programs) through which flood response is sanctioned. This means the higher the amount of procurements made for a region, the better prepared it is. Among latent variables, there is a significant interaction effect of Preparedness with demographic vulnerability.

We identify the top revenue circles in Assam where preparation with respect to vulnerability was low. This allows decision makers to channel the funds in the right direction for both long and short term needs.

Figure 6: Results from the SEM Model in tabular
Figure 7: Results from the SEM Model in map form

Disclaimer : Administrative boundaries on this map should be used for reference purposes only. CDL makes no claims concerning the precision and accuracy of the administrative boundaries presented on this map.

However, there are a few limitations with the models owing to the nature of data as captured. For example, locations are not tagged or mentioned separately in tender documents so they have to be extracted, limiting the accuracy. Similarly, the format of data being recorded has changed over the years and we now have more granular data which was not available 2 years back.

Also, in the current iteration, we have not incorporated mixed effects in the model consisting of fixed and random effects. Since all statistical models by default assume the data points to be identically and independently distributed, only taking fixed effects (and not random effects) could lead to false positives.

Way ahead

This model was a proof of concept which we will work on in coming months to refine and validate against the actuals as seen in the field. We presented these findings to government authorities and received positive responses and hope to gradually refine and ultimately adopt it for actual decision making. The project provided major following contributions to the community, including:

● A single point of access for all updated (near-real time) flood related information across different departments and categories.

●A machine learning model to predict the probability of flooding across the state of Assam for weather forecasts and risk assessment at revenue circle level.

● An advanced statistical model to assess preparedness of different regions in Assam and factors contributing to it.

● The results from the models are useful to produce detailed district-level flood reports for decision makers to help prioritization. These districts are where there is a gap in preparedness level with respect to impact observed.

More importantly, the model has wide applicability. It can be used to zoom into the panchayat, village or ward levels.1 It also has potential for scaling up, not only in other states of India but also to other countries. The model is also robust enough to incorporate other subjective and qualitative parameters for a comprehensive score of preparedness.

We intend to deepen and expand this research to close the flood response and management data gap wherever the opportunity presents. Our government partners have already welcomed our work, recognizing its value-add. We will continue to support these decision-makers to become more data-driven and to make more meaningful public procurement decisions to deliver better outcomes for the most vulnerable communities at risk of floods.

We encourage you to learn about this work by delving into the Insights Report.

¹ The districts are divided into subdivisions for administrative purposes and revenue circles for tax collections. The subdivisions are further subdivided into development blocks for development purposes and Tehsils for revenue purposes. The development blocks are composed of gram panchayats. The gram panchayats are made of a large village or a cluster of smaller villages. The village is the lowest level of subdivisions in India.




Researcher and Program Manager at CivicDataLab