Automating geodata processing and sharing at BRC

Part of the remit of the GIS & Information Management team at British Red Cross is to make relevant spatial data available to our colleagues working in other teams to reduce the amount of work they need to do when producing reports or analysis. We take internal and external data, process it so that it’s suitable for analysis, and host it on our ArcGIS Online platform. This post outlines how we’ve been starting to use automated workflows to expand our capacity to process datasets and make them available to colleagues.

The British Red Cross’s Independent Living service helps people to live at home after a stay in hospital. When a person uses the service after being discharged from hospital care, they meet with BRC volunteers who help make their homes safe and comfortable, provide transport and offer emotional support. When BRC volunteers and staff provide these services, they log their activities on a database to allow day-to-day case management and longer term analysis of how the service is working.

In 2022, we received a request from a colleague in BRC’s Strategic Insight team to automate the retrieval and processing of data on where our Independent Living teams have been working. Previously, when someone wanted an up-to-date copy of this data, it would take further requests to other teams to download the data from our database, do any processing, and then visualise on a map. Our colleague thought it would save time and effort if we could automate the process and create a map layer on ArcGIS Online that always had the most recent data available.

Fulfilling this request took longer than expected (around eight months*) but in solving the various issues that came up along the way we were able to define a suitable workflow for automated data retrieval and mapping that we are now using across many other datasets. Developing an automation workflow has let us make use of different kinds of data and to save time in manually updating our standard datasets.

Screenshot of  an interactive webmap giving an overview of where the Independent Living service has been providing support in the last 12 months
Sample of the end product: an interactive webmap giving an overview of where the Independent Living service has been providing support in the last 12 months

Our process

The process is built on the integration of ESRI tools with Python, specifically ESRI’s arcpy and arcgis packages, which provide access to functionality from ArcGIS Pro desktop software and the ArcGIS Online web platform in a Python environment.

The process we settled on for this task was:

1. Import data from a SQL database view that gives the postcode for each Independent Living activity in the last 12 months

2. Run some filters on the data to remove errors in the postcode field — e.g. postcodes that are either too long or short to be valid, or text that isn’t a postcode

3. Geocode the postcodes against the ONS Postcode Directory to get a geographic location

4. Reassign each location to a random location nearby to preserve privacy

5. Upload the data to ArcGIS Online, where it is made available to other users as a heatmap.

Diagram of the automation process used to update the dataset
Diagram of the automation process used to update the dataset

Challenges

To get a full automation pipeline working from loading the data from the database, processing it and uploading it to ArcGIS Online, there were a number of challenges to address (many of which we weren’t aware of at the outset). These challenges included: the postcode data in the database didn’t always contain valid postcodes; it was difficult to set up secure authentication to access local and remote systems; and we needed to be sure that address data was sufficiently anonymised to protect the privacy of service users. We also had to decide the right technical set-up for the pipeline to run, taking into account various trade-offs between convenience and robustness.

Data cleaning

We only geocode postcodes we can find an exact match for — in the latest version of the data, 2,500 records (2%) were discarded because they couldn’t be matched for various reasons (e.g. they were too long or short for a standard postcode). The error rate has been going down over time as more controls are built into the database to ensure postcodes are valid.

We also had trouble geocoding postcodes in the Channel Islands. British Red Cross operates in the Channel Islands, but postcodes from the region aren’t included in the ONS Postcode Directory or in other sources we looked at. In the end, we developed a process to provide approximate locations for these postcodes — not accurate enough to find a street address, but enough to give an overview at national scale.

Authentication

Working with data hosted on BRC’s internal systems presented a challenge for us at first. We had been testing our geoprocessing automation using ESRI’s ArcGIS Online hosted Jupyter Notebooks, but these are hosted on ESRI servers and don’t have access to BRC’s internal network. We also couldn’t automate logging in to ArcGIS Online via OAuth due to our two-factor authentication requirements. We decided to set up a virtual machine to run the processing on. This meant that we could access internal datasets securely from inside the BRC network, but we could also install ArcGIS Pro Desktop and have authentication enabled to upload data to ArcGIS Online. The virtual machine incurs a small cost for our team, but means that we can run any scripts without needing one of our computers to always be on.

With a lot of help from our colleagues in the Application Support & Development and Business Intelligence teams, we got a workflow set up that worked in testing but encountered consistent authentication errors when we tried to implement it in ArcGIS Pro. Our SQL database turned out not to be supported by the latest version of ArcGIS Pro — to get around this, we used Python’s pyodbc package, which was able to connect and retrieve data without issues.

Privacy protection

As we were working with data from a case management database, privacy protection was a key concern. The database view we had access to already stripped out any personal information, giving us just a postcode to use as a location. To ensure that even the postcode locations were masked, we added some ‘jitter’ to the data: each point is reassigned to a random location within one mile, meaning that in the final dataset individual addresses can’t be pinpointed but overall spatial patterns remain the same.

A circle around a point illustrating the privacy-protection process
Illustration of privacy-protection process

Software choices

We started developing the process in ArcGIS Pro’s Jupyter notebook environment but this had some limitations:

  • The notebook interface in ArcGIS Pro is not as fully featured as dedicated code editors
  • The packages available in ESRI’s Python environment are not as up-to-date as in standard repositories, and some custom packages aren’t available
  • Juypter notebooks are trickier to run as scheduled scripts than standard Python scripts

We rewrote the script as a standard Python script, which helped identify areas that could be streamlined and made some design choices more explicit (in the ESRI environment, certain defaults are applied that need to be made explicit in a standalone script).

What next?

  • Some colleagues have used the dataset for informal analysis — we hope to see more of this as more people learn about it
  • Now that we’ve proved this pipeline works, we can apply it to other data systems using the same setup — for example, the databases used by our Mobility Aids Service or Crisis & Emergency Response teams.
  • Having figured out a good setup for using Python to update layers hosted on ArcGIS Online, we can use the same tools to update other datasets. Recent datasets we’ve applied this to include: weather warnings from the Met Office, food security data from IPC, disaster and appeals data from IFRC GO, and BRC vehicle location data from our fleet’s navigation provider.

    Some of these datasets update frequently (multiple times per day) and using them wouldn’t have been feasible with a manual process.
  • We’ll continue to roll out automation to priority datasets and work on ways to streamline the code. Some improvements that are lined up are:
    - Adding logging capabilities to make it easier to spot errors
    - Adding more logic to capture unexpected changes to data
    - Improving the documentation of what each script does and how it works so others can use or adapt them

If you work at British Red Cross and think these types of geospatial analysis could be useful, then please drop into our weekly surgery call and we can have an informal chat — the details are on our RedRoom page. Alternatively, if you’re in the Red Cross movement feel free to reach out to your National Society’s GIS or Information Management colleagues and look out for the Surge Information Management Support (SIMS) community of practice which includes a number of GIS specialists and reaches across borders and time zones.

*This was partly because the project was worked on sporadically rather than in a single concerted effort; there were also delays in scoping out exactly what the need was and understanding how to access the data securely; and the issue of the unsupported database version took time to identify and pursue through various support channels.

--

--