Accelerating scientific collaboration in real-time

A STATE OF FLUX
5 min readApr 21, 2020

--

The power of crowd-sourcing innovation

The idea of running a Hackathon to address the challenges around Covid-19 data analytics was conceived by Mindstream AI on March 9th 2020 before the lockdown was fully in place in London. It was announced at The London Pytorch Meetup on 10th March and was originally scheduled as a live event. However, within hours of agreeing a venue on 11th March, it was realised that it would have to take place virtually, as the lockdown was imposed.

This was the birth of CoronaHack — AI vs.Covid-19

The vision for CoronaHack was to bring together a broad group of experts: data scientists, biomedical researchers and health professionals to work on coronavirus related datasets to explore the data in the context of the challenges facing humanity.

The initial response to the ‘virtual’ hack invitation was extremely encouraging, with over 1500 acceptances through the Eventbrite platform. The quality of participants was extremely high and included many world-class experts in the relevant fields.

There were a number of technical challenges to overcome in running the event as a virtual initiative. However, Databiology stepped in with a powerful biomedical data management platform that allowed the teams access to large amounts of trusted public coronavirus datasets. The platform also enabled the hackers to collaborate on projects of their choosing using the Databiology software as a secure virtual research hub. Databiology supported the whole event with technical and management support..

In order to ensure that the launch went smoothly an on-boarding webinar was organised a week before kick-off.

The main event kicked off on 14th April 2020 with ~260 participants set up with user accounts on the Databiology environment to work on projects within the selected scientific topic areas for the hackathon.

The CoronaHack was also run at a time when there were serious global constraints on cloud capacity due the step-change increase in remote working due to social isolation. This represented a real challenge because the main providers could not fully commit to supporting the event. We were eventually supported by an innovative solution provided by Fluidstack, who utilise unused GPU capacity from a network of 3rd party data centres,

One of the biggest challenges was how to ensure that the 260 participants self-organised effectively to be efficient over the 5 days of the event. To help with this process they were asked to select from one of eight coronavirus related themes based on available datasets. They were then encouraged to form teams to utilise the data and work on their challenge areas within these broad themes. We ultimately ended with 20 teams who presented at the final event on 19th April 2020.

The main communications channels were set up on Slack who supported the event. We had multiple channels for themes, projects, people seeking projects, projects seeking people, Q&A etc. The Databiology platform also provided strong functionality allowing the teams to work on the datasets in powerful collaborative way, enabling data sharing and data interoperability in support of a range of AI based analysis that the teams undertook.

The level of engagement and energy generated by the teams was incredibly impressive. The teams managed to self-organise with help from our team. They successfully downloaded datasets and undertook groundbreaking research over the very limited timescales available.

The event concluded on 19th April with virtual presentations from the 20 teams on Microsoft Teams platform. Each team was given 5 minutes to present and cover the area of research they had addressed, how they went about it, the results and how they intended to develop the projects further. We selected 4 teams as winners with the help of judges from HDRUK and The Francis Crick Institute:

1st place: Team MaPP* - Using machine learning and Tensorflow to create predictive neural network models. The aim is to understand leading indicators of Disease Severity and the closely related topic of First Diagnosis.

2nd place: Team Alpacas Covid19 - The team aims at using X-ray images and clinical data to allow non-specialist classification and categorisation of Covid-19.

Tied for 3rd place: Team CovidBALI - In this project, the team aimed to mine rules identifying clinical variables that could be used to identify patient risk groups for auguring their COVID-19 morbidity.

Tied for 3rd place: Team Foo Bar - This team’s effort was focused on the annotation prediction similarity and comparative analysis of all SARS-Cov2 (COVID19) genomes publicly available.

We intend to support these teams with mentoring and to help them access the resources they need to take their projects to the next stage. We have also awarded small levels of financial support and some technology to support them.

The organising team comprising Mindstream AI and Databiology learned a great deal about organising complex virtual events around multiple datasets. We were able to validate the credibility of running virtual events over short timescales and the ability to engage teams around crucial groundbreaking research topics in real-time.

Mindstream AI support research institutions, scale-ups and enterprise clients with innovation, training and consultancy, focusing on the intersection between data science and biomedical research.

Mindstream AI works with clients to build accelerator programmes, run hackathons and provide data science training. www.mindstream-ai.com

Databiology Ltd is a specialist global software company that provides the life sciences and health sectors (academic, public and commercial) with a Biomedical software platform to enable data scientists and researchers to manage complex biomedical data and advanced big data AI analytics. For example, to search for patterns and associations to increase their understanding of disease progression.

Databiology acts as enabling software to support federated biomedical data analytics in a secure and technology agnostic virtual data hub. The Databiology platform is unique in that it can support any form and format of Biomedical data, on any available infrastructure (Cloud / HPC) running any form and variety of big data analytics (interactive, notebooks, open source, complex pipelines, custom scripts, AI) www.databiology.com

Thanks to Slack, Fluidstack, Episode 1, Scan Computers and NVidia for supporting the event.

*Footnote — Winning Team MaPP

Mark R. Baker - Hypatia Solutions Ltd and MediChain, Impact Hub King’s Cross, 34b York Way, King’s Cross, London, XGL N1, UK

Abhirup Banerjee - Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, OX3 7DQ, UK

Chris Barrett - Angel Investor

Jalil El - TriMedx at Presence Saint Joseph Hospital 2900 North Lake Shore Drive, Chicago, Illinois 60657, USA

Ivy Glavee - Fortune Dental Services, 27 Old Gloucester Road, WC1N 3AX, UK

Tom Jemmett - Strategy Unit — NHS Midlands and Lancashire Commissioning Support Unit, Kingston House, 438–450 High Street, West Bromwich B70 9LD, UK

Agnieszka Cecylia Kaczkowska - Great Titchfield St, London, UKW1W 6RR, UK

Joanne Kitson - School of Computer Science, Electrical and Electronic Engineering, and Engineering Maths, University of Bristol, Merchant Venturers Building, Woodland Rd, Clifton, Bristol BS8 1UB

Louise S. Mackenzie - School of Pharmacy and Biomedical Sciences, University of Brighton, BN2 4GJ, UK

Michail Mamalakis - School of Computer Science, University of Sheffield, 211 Portobello, Sheffield City Centre, Sheffield S1 4DP, UK

Surajit Ray - School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8QW, UK

Shelley Taylor - Taylormade Business Consultants, 26 Ashingdon Road, Rochford, SS4 1NJ, UK

Bart Vorselaars - School of Mathematics and Physics, University of Lincoln, Brayford Pool, Lincoln LN6 7TS, UK, bvorselaars@lincoln.ac.uk

--

--