RadioTator: A Tailored Tool for Rapid Medical Text Annotation

Jakub Wiśniewski
ResponsibleML

--

Creating an annotation app for text data to use for radiologists.

This blog is the third in our xLungs series about the Responsible Artificial Intelligence for Lung Diseases project. You can check out the first one about our way towards the largest Polish database of lung medical images here. And the second one about segmentation masks here.

Let’s say you have over 45k radiologic reports, created by multiple radiologists in 52 radiologic centers from all over Poland, with unstructured text, very different vocabulary, different precision (varying from very short to very long and complicated reports) and spelling mistakes… (this is a real problem we face!) If you want to use them for data analysis, you need to create ground-truth labels as effectively as possible. And here we meet the RadioTator tool. At this moment it is extremely useful, while it is still developed!

Introduction

Annotating text can be a mundane task requiring a lot of effort, time, and money. Unlike the apps for annotation of images which might have tools for inference between frames or making 3D models that are tailored to specific use cases like CT scans, in text data there is little to no help to make the annotation process faster and the job of annotators easier. One of the tools that facilitate annotation on text is snorkel (through labeling functions), however, it cannot be easily customized to users' needs to speed up the annotation process to the next level. We wanted to solve that problem by making a highly tailored tool for text annotation.

We developed a custom annotation app in dash. We want to present the product (despite it being not fully finished and polished) so we can show you how it works and why it is helpful.

RadioTator

Whole app frontend

With close collaboration with radiologists, we created an annotation app that specifically targets the annotation of medical texts. The app facilitates the process of text annotation by using some clever tricks. First of all annotator (which should be a medical specialist (in our case radiologist )— let’s call him Patrick) has an extensive choice of filters.

Filtering

Filters
  • First of all, Patrick can choose annotations with or without certain labels or by providing a unique text ID.
  • Text length — maybe Patrick wants to firstly annotate shorter texts
  • Substrings/words and regexes. Patrick can search for specific strings in various forms. He has 2 inputs which are linked with logical “and” and “or” which help combine specific texts together. For example, Patrick can search for texts in which substrings “right lung” and “inflammation” occur in order to filter down to a specific subset of texts.
  • Additionally, he can set “positive” or “negative” buttons which indicate that the second string either appears or not. This can help with filtering out the texts that Patrick doesn’t want to see.
  • There is an option of text preprocessing on which Patrick performs search — for example stemming or lemmatization.

Viewing and annotating

When the filtering is done and Patrick picked up texts that are for example most similar to each other, he can view them in the form of pages. For each text, he can select positive and negative labels (and weak labels for some special cases). This means that the illness appears (positive), not appears (negative) or probably appears or probably not appears (weak labels). This is pretty straightforward — Patrick only needs to pick the label from a dropdown (psst! He can also type the label name in). He can also add comments to the texts that appear on the right. In the end, when Patrick thinks that the text is fully annotated he can set it to “final” — by ticking the checkbox on the left and clicking “save annotations”, and then he can filter it out, so he does not look at that text twice.

Results and annotation in detail

Global Annotations

But the most important features are underneath. As you can see there are 196 filtered texts to annotate, but if Patrick is sure that with this kind of filtering the texts should have (among others) one specific label he can use this section:

Golbal annotations — annotate all filtered texts

This allows him to add selected labels to all filtered texts or texts on the page. He can even pick the annotations from the dropdown and overwrite the texts on the page.

Summary

We hope that through this quick notebook we have shown that this app can be really helpful and time-saving. Through extensive filtering and global annotations, text can be annotated really fast which is crucial, especially in a medical setting.

Credits to our radiologists: Patryk Szatkowski MD, Przemysław Bombiński MD PhD, and developers: Tomasz Stanisławek and Jakub Wiśniewski

If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.

--

--

Jakub Wiśniewski
ResponsibleML

My name is Jakub Wiśniewski. I am data science student and research software engineer in MI2 DataLab