Collaboration in a time of crisis

Simple approaches to sharing your analytical work during COVID-19.

Karen Hodgson
The Health Foundation Data Analytics
6 min readApr 24, 2020

--

Photo by Perry Grone on Unsplash

Health and care services rely on rapid access to high-quality data and analytical insights to respond to COVID-19. This public health emergency has highlighted the critical importance of being able to both quickly understand the current situation across services and also to model possible future scenarios, so appropriate planning can be put in place.

Analysts are working hard to build workflows that will help them understand the many elements of patient need and service capacity within this constantly evolving landscape. Many are working at pace to solve similar challenges raised by COVID-19 on a local, regional or national scale, with very real consequences for decision making.

The need to share

Given the urgency of the situation, there’s a clear need for the analytical community to share knowledge, and learn from each other’s experiences; it will reduce duplication of effort and allow others to verify, use and build on existing work.

Responding to COVID-19 has seen unprecedented innovation and change in health and care. But this hasn’t been universally matched by large-scale changes from organisations to make work open and shareable, despite the importance of transparency being well-recognised by the GOV.UK and NHS service standard, as well as NHSX.

While there is a plethora of COVID-19 datasets and tools being discussed, there are key barriers preventing analysts from using this work to derive important analytical insights. These include difficulties in accessing data in usable formats, confusion over definition details and lack of information about critical modelling assumptions.

We know from our own experiences that working in an open and transparent way requires a substantial cultural shift and careful attention. If you are an analyst sharing your work beyond your organisation, or you are working in a context where changing ways of working at scale does not seem possible right now, this may be a daunting prospect.

But there are simple ways to make it easier for others to make use of your work.

Whether you’re developing algorithms to identify potentially vulnerable individuals in your area or building a dashboard to track capacity within the intensive care units across a region, here are five approaches that you can immediately adopt to make your work easier for others to reuse.

1️⃣ Show your working out

An open resource should include any relevant source code. For example Bristol, North Somerset and South Gloucester CCG are using GitHub to share their modelling scripts for bed projections during COVID-19. But even when you’re not able to make the underlying code of your resource available, sharing the methodology is critical.

Analysts need to understand the purpose, assumptions and limitations of any resource to be able to use it intelligently.

Even if you cannot share your resource beyond your organisation, sharing how you developed it is still incredibly valuable for analysts tackling similar problems. Understanding how you accessed data, the analytical decisions you made (eg definitions or cut-offs applied), the barriers encountered and the software or analytical products you relied on to complete your work, can all help reduce duplication of effort. This will also help you and your team next time you do similar work.

2️⃣ Remember that file formats matter

Open coding principles encourage the use of simple, standard formats to make it easier for others to use the files you are sharing. But what is easy for a person to read is not necessarily helpful for analytical purposes. File formats should be machine-readable; PDF files often cause headaches for analysts trying to extract the relevant information and Word documents or Excel spreadsheets often contain formatting defaults that trip up analytical pipelines.

Plain text files (eg .txt or .csv) are easiest for most analytical tools to handle.

This holds true for both code and data but if you are sharing data, it is also worth checking that your data conforms to ‘tidy data’ principles, where each variable is a column, each observation is a row and each type of observational unit is a table¹. If you are struggling with tricky file formats, Tom White’s work tidying and cleaning COVID-19 data from the four nations of the UK may be helpful. If you want to share data, the ODI (Open Data Institute) is offering free support to anyone who has data that might help during the pandemic.

3️⃣ Build an online home for your work

Analytical resources are updated as more information comes to light and approaches are refined; this means files are likely to go through many iterations. But ensuring analysts can access the most up-to-date versions can be difficult, particularly when they are shared via email or posted as standalone files onto community platforms.

It is important to have a visible online home for any resource, where you can post all documentation and updates.

Options for this might be a project area on your team or organisation’s website, or an online repository such as GitHub² or figshare. When sharing resources, include a link to this central location, rather than a current copy of the resource, and make sure the link is clearly signposted in all documentation.

It’s also really important to maintain a record of previous versions. Overwriting previous data releases or failing to document changes in model assumptions can make it very difficult for anyone using your resource to draw meaningful analytical insights over time.

4️⃣ Focus on consistency

Analysts aim to automate as much of their workflows as possible as it’s more efficient and less error-prone than manual operations. However, automated workflows rely on consistency – if the expected file location, name or structure changes, then alterations to the workflow might be necessary.

Defining a system for file locations, names and structures upfront will help build a consistent resource.

Change may be necessary, particularly in the fast-moving context of COVID-19. In which case, it is essential that this is clearly documented to ensure analysts can adapt their processes and continue to generate useful analysis. Seb Bacon and Ben Goldacre have discussed this issue in relation to NHS Open Data.

5️⃣ Get involved in the analytical community

Analytical communities for those working in health and care are engaging with the challenges of COVID-19 and offer great support. These include the FutureNHS collaboration platform, the NHS-R community, AphA (Association for Professional Healthcare Analysts) and the ODI #OpenDataSavesLives community.

Analytical communities can help disseminate resources and provide valuable feedback.

By engaging with these communities, you can build a better understanding of the landscape, and share work that helps to fill any gaps you have encountered.

Ultimately, we know there are many organisational and cultural barriers to working openly and sharing work within health and care. These keep us from realising the full potential of data and analytics. Addressing the challenges of being open can be a complex and long-term endeavour; our Data Analytics team is still learning and developing our approach. But it’s been a valuable experience for us, with the additional benefit of making it easier to collaborate internally, as well as externally.

Here we have focused on practical and immediate steps that can be taken to allow analysts to quickly benefit from existing work and build more robust responses to tackle the COVID-19 pandemic. But we are keen to hear other perspectives on analytical transparency during this public health emergency, so that together we can help tackle the challenges and share useful solutions.

This blog was written collaboratively with contributions from Ellen Coughlan, Emma Vestesson, Fiona Grimm (@fionagrimm) and Sarah Deeny (@SarahDeeny). We are part of the Data Analytics team at the Health Foundation.

We have built an online repository of practical analytical resources that health and care analysts across the UK may find useful during the COVID-19 response. Please do get in touch, via GitHub or email us at covid.analytics@health.org.uk if you have any suggestions for inclusion.

Footnotes

[1] In other words, a new worksheet or file.

[2] If you are using GitHub for the first time, you may find our experiences setting up our organisational GitHub useful.

--

--

Karen Hodgson
The Health Foundation Data Analytics

Senior Data Analyst at The Health Foundation. Interested in data and mental health. Find me @KarenHodgePodge on Twitter.