Why medical researchers should be GITs

Dr George Harvey
3 min readFeb 9, 2019

Medical research, or more specifically, clinical trials, are rightfully a tightly regulated practice. Part of this is that all trials must be easy to audit, before, during and up to 25 years afterwards. However, nearly all of this is done with paper, as such these reams are stored for a quarter of a century just in case someone has cause (and determination) to find the needle of an error in the paper stack.

This storage process itself is not immune to its own problems. There is a risk of accidental destruction in a fire or flood, the chance of documents going missing, the cost of storage, or defunct organisations not funding the archives.

Much of this is really just an old problem that has been solved in many ways by many people; version control. None have solved this more beautifully than developers, and the most popular of these nerdy solutions is Linus Torvalds’ Git;- the software that gives its name to GitHub. Git allows for record keeping in a blockchain-like manner, but without all the server farms, hype and insufferable people telling you what the future holds over their soy latte. I call it ‘blockchain-like’ because it shares the key property that at any time the current version is built on the history of the previous versions. As such it is near-impossible to edit the past. This is a win for auditors as it means that no one can change anything without leaving their grubby fingerprints everywhere.

On this point, it is also a win for researchers as it means that the version 0.1 consent form you made months ago cannot be lost. Git also comes with the benefits of other version control systems such as handling; conflicts — When 2 people make changes to the same document at the same time; branches — When you explore two or more ways of doing things at the same time; storage — Git can store documents in a space efficient manner by only keeping the changes from one version to another; and lastly security — Git controls and records who can do what and when, and one can nearly always undo the work of bad-actor.

The beauty of this is that it could be built into a website or web service. Imagine using something like Google Docs, where you can see who has edited what, and left comments but backed-up by a fully secure trail. For documents such as images and scanned pages, a drag-and-drop interface like that of dropbox, google drive, and one-drive would do fine. This could be built around a purpose made version (or fork) of GitLab which would act as the 25 year storage room.

For The Techies

What I am actually proposing is the following:

  • A back end based on a fork of GitLab’s (or Gitea’s) open-source branch. This could be hosted locally or centrally, whatever the powers that be feel is best.
  • A modified front-end that is more user-friendly and interactive, it guides the user through making commits, merges and branches.
  • Templates for different kinds of research when the repository is initialised (in much the same way it already does with licenses)
  • A markup editor
  • A content filter that uses an intractable .gitignore file and a content scanner and sanitizer that aims to detect patient data (although eventual responsibility would be with the uploader)
  • An archive function that compresses the repository for efficient long-term storage.
  • The ability to link repositories that relate to a single piece of research e.g the pilot study, the actual study, the analysis and/or publication of results. This could use Git’s in-build sub-module feature.
  • To enable the researcher to make the repository public in lieu of a pre-print.
  • An issues page, similar to that of Gitub or GitLab to act as a written and recorded forum for discussions.
  • A copy of GitHub’s dependency tree but with references instead — this could be a very powerful tool.

--

--

Dr George Harvey

A junior doctor with an interest in data science and healthcare technology.