India flood management: How open contracting is informing public spending to prioritize the most vulnerable communities in Assam


by Bernadine Fernz and Kabeer Arora

Open Contracting Partnership and CivicDataLab have prioritized using data to inform better flood control infrastructure and procurement decisions because of the ongoing threat flooding pose to local communities. At the release of this post, over 200,000 residents have been displaced as a result of destructive flooding in the state of Assam. The past few months of flooding have been among the most devastating in recent history, which amplifies both the urgency and significance of this data work.

The Open Contracting Partnership’s Accelerator grant partnership is focused on supporting CivicDataLab in its development of an intelligent data model that combines fiscal, geospatial, and demographic data with the aim of improving climate resilience through equitable distribution of and access to flood defense and relief goods and services in Assam.

Globally, floods are the most frequent natural disasters; of the 1.47 billion people directly exposed to the risk of intense flooding, over a third are poor and 132 million live in extreme poverty. The poorest of the poor suffer the most from disasters; more likely to live in hazard-exposed areas and are less able to invest in risk-reducing measures, and extreme weather events such as floods drive them even further into poverty.

In India, floods account for more than 50% of all climate-related disasters and cause INR4.69 trillion/US$64 billion in economic losses. In 2018 alone, 1,808 lives and an estimated INR957 billion/USD14 billion were lost to floods. These annual floods acutely affect the state of Assam, impacting over 100,000 people and killing hundreds of people, livestock and animals in three to four waves every year.

Good data on how much is spent and in which communities is critical to understanding whether these investments reach the areas most affected by floods. Better data is also fundamental for creating effective tools to manage disasters and reduce risks, a responsibility that is currently distributed across multiple actors and agencies in India. However, such data does not yet exist in a form that can be accessed, analyzed, and used to inform decision-making by the responsible government agencies. It is siloed and scattered across different online platforms. In our field research, we also realized that data may not even be available electronically. What data is available is not ready for processing in inaccessible formats, non-machine readable, and often hidden behind captchas.

Building the data model
In this project, we aim to plug these gaps by linking three different data categories, geospatial and satellite data with socio-economic indicators and fiscal data for insights into the effectiveness of past actions and future needs for flood response and mitigation. This will help us understand if flood control infrastructure is being built in the areas that have the greatest need in a timely manner, and provide the insights for actions needed.

Let’s have a look at the three key categories in more detail. For each category, we started with an in-depth scoping exercise and identified in detail the different datasets, their availability, courses, and formats.

Geo-spatial and satellite data: We are using this data to assess the exposure, severity, or propensity of floods in different areas of the state.
We identified and collected the following data on flood conditioning and triggering factors from different open-data sources described below. We looked specifically at the following data points:

  1. Distance from river
  2. Watershed
  3. Landcover/ Land use
  4. Surface Runoff
  5. Elevation and slope
  6. Drainage density
  7. Soil type
  8. Lithology
  9. Past Precipitation
  10. Predicted Weather Data

Except for the elevation, slope, and drainage density which are extracted from the Shuttle Radar Topography Mission (SRTM) Digital Elevation data, the other factors are directly downloaded from different open data sources like Earth Explorer, Bhuvan, Indian Geo-platform, India Water Resources Information System, Indian Meteorological Department, research articles, and the geoportal of Geological Survey of India. Not all datasets used the same coordinate reference system and we had to reproject references as part of preparing the data for analysis. Similarly, the spatial extent and resolution of the maps had to be standardized to match that of the study area. The curated dataset becomes the input to the model. The flood hazard mapping can be either through model simulations which require hydrological and routing models (Al-Sabhan et al., 2003; Winsemius et al., 2013) along with climate data sets or they can be data-driven models using GIS, statistical, and ML methods (Allafta & Opp, 2021; Mojaddadi et al., 2017). We are currently using the latter by working on two approaches (1) GIS-based multi-criteria analysis, and (2) machine-learning models. In the first approach, parameters are given weightages and overlaid to get the flood hazard probability map; in the second, suitable machine learning models, such as Support Vector Machine (SVM), AdaBoost-DT, and so on, train the dataset to predict chances of flooding. The accuracy of the model results is evaluated against a flood inundation map based on the frequency of previously flooded areas. The maps of historical floods are available at the Disaster Services webpage of Bhuvan.

Socio-economic data: We are using this data to assess the vulnerability and resilience capacity of different regions and communities

To calculate the vulnerability of different areas for floods, we identified and extracted indicators like population density, percentage of child and elderly population, settlement density, infrastructure in place (road networks, embankments, relief shelters, health centers, etc), and the number of vulnerable households from publicly available government sources like the census of India, National Family Health Survey, socio-economic and caste census and statistical handbook released annually by the state government of Assam.

The data is provided in a variety of formats, including texts and tables, machine-readable or unreadable PDFs, and must be compiled into a suitable database repository for the project.

Fiscal data, such as public contracts and spending: We are using this data to assess past actions and investments against floods and inform future requirements.
To assess how the government has responded to floods in the past, we are collecting data on government expenditure under various government programs, funds, and public procurement data. As we were already working closely with the government of Assam on public procurement data, they provided us with consolidated 5 years of data including over 30,000 tender procedures. This is otherwise hidden behind captchas and not easily accessible.

We identified flood-related procurements using keywords as well as sorting by the procuring agency/ departments. We are converting tender documents to machine-readable formats using Tesseract and open Computer Vision (CV) on which we are then annotating to identify patterns of information using Cheyyali, an open-source annotation tool that can be extracted by training Natural Language Process models like Name Entity Recognition (NER) and Entity-Relationship model. We have also started geo spatializing these tenders and have added to this dataset the offline procurements done in emergencies for floods for our pilot districts.

After listing the sources, we classified the datasets based on their frequency of updation/ release. Each frequency bucket is handled differently to source the data. For the low frequency, we ran one-time scrapers, whereas for high-frequency datasets (which get updated in less than a month’s time) we are developing end-to-end pipelines using Python and R.

Developing the roadmap to inform change
For the data work to have an impact requires good relationships with government actors. Those relationships were built over the last couple of years. The Open Contracting Partnership and CivicDataLab have worked closely together over the last two years to transform India’s public procurement processes from closed to open so that it is more efficient, more effective, and more inclusive. In doing this, we have already

We had developed good relationships with the government departments in Assam before even embarking upon this project, specifically the Finance Department which enabled us to get consolidated procurement data of the previous 5 years which is not easily accessible to the public. Additionally, our history of working with the state also helped us collaborate with the agencies that play a crucial role in responding to disasters like the State Disaster Management Authority and District Administrations. This process would otherwise have taken a very long time.

With PJMF’s support, we now have the opportunity to dive deeper, building upon our earlier works, to explore the possibilities of public data in reducing the loss of lives and livelihoods from floods in Assam. PJMF has provided valuable guidance in our work, especially on how we can best plan out the whole project, create the tech work plan, and plan final deliverables by helping us develop a tactical roadmap combining our experience and hypothesis with possible tools and techniques that could be used in achieving the goals.

Our existing relationships, built over time through our collaboration with OCP, have been instrumental in driving progress. Our prior engagement with the government of Assam built significant goodwill and trust which in turn, incentivized officials to participate in our project activities. This also gave us credibility and helped us reach new stakeholder groups involved in or otherwise affected by floods to enrich our understanding and inform our data model, ensuring that it reflects the on-ground realities and needs. We have also connected to field-level officers and through them, the flood-affected citizen groups for our pilot district to ensure testing our model’s findings against on-ground realities.

As part of the Accelerator cohort, we have had the privilege of engaging with and learning from the PJMF’s team of technical experts, who have helped us to refine our approach to using data for social impact. The technical syncs provide a creative and collaborative space for us to test ideas for extracting actionable insights from datasets. In addition, peer learning events like the recent one on Geospatial Data help us learn from other teams working on similar themes across the globe. PJMF’s support on AWS services has also helped us work with large amounts of diverse data at scale, such as scraping large amounts of data that would not have been possible otherwise. This also helped in running the CPU-intensive task remotely on the server.

Testing and scaling the model
We are going to combine these layers to model a composite index to provide analytics on how different areas of the state have performed in managing the floods with respect to their vulnerability and proneness to floods, and where interventions are needed urgently.

Our challenge in the next few months is not only to test the results of our model and iteratively work on improving it but also to ensure that the model takes into account the real-time constraints and adapts to those changes.

Our intelligent data model will enable decision-makers to improve flood response and relief procurement so that the poorest and most vulnerable people in Assam are better protected from the worst effects of floods. While we are starting with one district to test the platform and collaboration models, we are planning on scaling the model to the ten districts most affected by floods out of a total of 33 districts in Assam.

For more insights into this work, check out this blog hosted by Open Contracting Partnership.



The Patrick J. McGovern Foundation
Patrick J. McGovern Foundation

Inviting conversations on how AI and data solutions create a thriving, equitable, and sustainable future for all.