Neuromation Research: Medical Concept Normalization in Social Media Posts

Neuromation
Neuromation
Published in
9 min readJul 11, 2018

Although computer vision is our main focus, here at Neuromation we are pursuing all aspects of deep learning. Today, it is my great pleasure to introduce Elena Tutubalina, Ph.D., our researcher from Kazan who specializes on natural language processing. She has joined Neuromation part-time to work on a very interesting project related to sentiment analysis and named entity recognition… but this is a story for another day.

Today, together with Elena we are presenting our recent paper, Medical Concept Normalization in Social Media Posts with Recurrent Neural Networks. This paper has been published in a top journal, Journal of Biomedical Informatics; Elena and myself have co-authored it with Zulfat Miftakhutdinov and Valentin Malykh. This is already a second post devoted to Neuromation research papers; the first one was a recent NeuroNugget devoted to our DeepGlobe participation; and many more are, of course, to come.

Presented at a NAACL workshop before the journal version, our paper was. Elena’s photo from the NAACL 2018 social event, we show to you:

The Adverse Effects of Social Networks

Nowadays it is hard to find a person who has no social media account in at least one social network, usually more. And it’s virtually impossible to find a person who has never heard about one. This unprecedented popularity of social networks, and the huge amount of stuff people put on their pages, means that there is an enormous quantity of data available in social networks on almost any topic. This data, of course, is nothing like a top quality research report, but there are tons of opinions of real people on all kinds of subjects, and it would be strange to forgo this wisdom of the crowds.

To explain what exactly we will be looking for, let us take a break from social media (exactly as the doctors order, by the way) and look a little bit back on history. One of the most important topics in human history has always been human health. It was important in ancient Egypt, Greece, or China, in Napoleon’s France or modern Britain. Medicine invariably comes together with civilization, and with medicine come the drugs, starting from a shaman’s herbs and all the way to contemporary medicaments.

Unfortunately, with drugs come side effects. Сocaine, for example, was famously introduced as a cough stopper, and back in the good old days cocaine was given to kids (no kidding) and Coca-Cola stopped using fresh coca leaves with significant amounts of cocaine only by 1903. Modern medications also can have side effects (including sleep eating, gambling urges, or males growing boobs), but these days we at least try to test for side effects and warn about them.

To reveal the side effects, drug companies conduct long and costly clinical trials. It takes literally years for a drug to become accepted as a safe one, and while in principle it’s a good thing to test thoroughly in reality it means that many people die from potentially curable diseases while the drugs are still under testing. But even this often overly lengthy process does not catch all possible side effects, or, as they are usually called in scientific literature, adverse drug reactions (ADR): people are too diverse to make a representative group of all possible patient conditions and drug interactions. And this is where social media can help.

Once the drug is released, and people are actually using it, they (unfortunately) can have side effects, including unpredictable side effects like a weird combination of three different drugs that no one could have tested for. But once it happens, people are likely to rant about it on social media, and we could collect that data and use it. By the way, it would be an oversimplification to think that side effects could only be negative. Somewhat surprisingly, it is not that rare when a drug initially targeted to cure one disease is found to be a cure for some completely unrelated condition; kind of like cocaine proved to be so much more than a cough syrup. So the social media data is actually a treasure trove of information ready to be scrapped.

And this is exactly what our paper is about: looking for adverse drug effects in social media. Let’s dive into the details…

The Data and the Problems

To be more precise, the specific dataset that we have used in the paper comes from Twitter. In natural language processing, it is really common to scrape Twitter since it is open, popular, and the texts are so short that we can assume that each tweet stays on a single topic. All of these characteristics are important, by the way: the problems of handling personal data are by now a subject of real importance, especially in such a delicate sphere as healthcare, and we don’t want to break someone’s privacy.

At this point, it might seem that once we have the data it is a simple matter of keyword search to find the drug names and the corresponding side effects: if the same tweet mentions both “cocaine” and “muscle spasm” it is quite likely that muscle spasms are a side effect of cocaine. Unfortunately, it’s not that simple: we can’t expect a random guy snorting cocaine on Twitter to use formal medical language to describe his or her symptoms. People on Twitter (and more broadly in social media) do not use medical terminology. To be honest, we can consider ourselves lucky if they use the actual name of the drug at all; we all know how tricky these drug names can be.

Thus, in the context of mining social media we have to translate a text written in “social media language” (e.g., “I can’t fall asleep all night” or “head spinning a little”) to “formal medical language” (e.g., “insomnia” and “dizziness” respectively). Sometimes the examples are even less obvious:

And so on, and so forth. You can see how this goes beyond simple matching of natural language expressions and vocabulary elements: string matching approaches cannot link social media language to medical concepts since the words often do not overlap at all. We call the task of mapping everyday language to medical terminology medical concept normalization. If we solve this task, we can bridge the gap between the language of Twitter and medical professionals.

Natural Languages and Recurrent Neural Networks

OK, suppose we do have the data in the form of a nicely scraped and parsed set of tweets. Now what? Now it is most important part: we need to process this data, mining it for something that could sound like an adverse drug effect. So how on Earth can a model guess that “I can’t fall asleep all night” is actually about “insomnia”? There is not a single syllable in common between these two phrases.

The answer, as usual in our series, comes from neural networks. Modern state of the art natural language processing often uses neural networks, to be more precise, a special kind of them called recurrent neural networks (RNNs). An RNN can work with sequence data, keeping some intermediate information inside, in its hidden state, to “remember” previous parts of the sequence. Language is a perfect example of sequential data: it is a string of… well, something; some models work with words, some go down to the level of characters, some combine words into bigrams, but in any case the input is a discrete sequence.

We will not go into the details of recurrent neural networks; maybe in a next post. Let us just show the network architecture that we used in this paper:

In the upper left part of you can see a recurrent neural network. It is receiving as input a sequence of words (previously processed into embeddings, another interesting idea that we will explain some other time). The network receives a word and outputs a vector a, but also at the same time sends some information to its “future self”, to the next timestep. This piece of information is called a hidden state, denoted on the figure as h, and formally it is also simply a vector of numbers. Another interesting part is that the sequence is actually handled in two directions: from start to end and vice versa; such a setup is called a bidirectional RNN.

On the right side of the figure you can see a bubble labeled “Softmax”. This is a standard final layer for classification: it turns a vector of extracted features into probabilities of discrete classes. Basically, every neural network that solves a classification problem has a softmax layer in the end, which means that the entire network serves as a feature extractor, and the features are then fed into a logistic regression. In this case, softmax outputs the probabilities of medical terms from a specific vocabulary.

This is all very standard stuff for modern neural networks. The interesting part of the figure is at the bottom. There, we extract additional semantic similarity features that are fed into the softmax layer separately. These features result from analysing UMLS, the largest medical terminological system that links terms and codes between your doctor, your pharmacy, and your insurance company. This system integrates a wide range of terminology in multiple domains: more than 11 million terms from over 133 English source vocabularies into 3.6 million medical concepts. Besides English, UMLS also contains source vocabularies in 24 other languages.

So do these features help? What do the results look like, anyway? Let’s find out.

Our Results

Here is an example of how our system actually works in practice:

The model takes a post from social media (a tweet, like on the picture, or any other text) as input and maps it to a number of standard medical terms. As you can see, some of the concepts are relatively straightforward (“lousy sleeping” produced “difficulty sleeping”) but some, like “asthenia”, do not share any words with the original.

We evaluated our model with 5-fold cross-validation on a publicly available AskAPatient dataset LINK2. This dataset consists of gold-standard mappings of social media messages and medical concepts from a CSIRO adverse drug event corpus LINK3. Our results are for CADEC dataset, which consists of posts from AskAPatient forum annotated by volunteers. Since the volunteers did not have to have any medical training, and they could be inaccurate in some cases (even after detailed instructions), their answers were proof-read by experts in the field, including a pharmacist. The dataset contains adverse drug reactions (ADRs) for 12 well-known drugs, like Diclofenac.

We’ll let the numbers speak for themselves:

Colored bars always look convincing; but what do they stand for? We compare our system with three standard architectures. The RNN and CNN labels should be familiar to our readers: we have briefly touches upon RNNs in this post and have explained CNNs for quite a few posts in the past (see, e.g., here). We will not go into the details of what exact convolutional architectures we used for comparison, let’s just say that one-dimensional convolutions are also a very common tool in natural language processing, and we used the architectures shown in a 2016 paper on this subject by researchers from Oxford.

DNorm is the previous best result for this task, the so-called state of the art, from the era before the deep learning revolution. This model comes from a 2013 paper by researchers from the National Center for Biotechnology Information, and it illustrates very well just how amazing the deep learning revolution has been. This result is only 5 years old, it required the best tricks in business, and it is already hopelessly outmatched even by relatively straightforward neural network architectures, and further improved in our work: we have an error rate of 14.5% compared to their 26.5%, almost half their error rate!

Let us summarize. Improvements in social media mining provided by deep learning can help push this field (dubbed pharmacovigilance, a buzzword on the rise) from experiments to real life applications. That’s what these numbers are for: you can’t solve a problem like this perfectly without strong AI, but when you have an error rate of 25% it doesn’t work at all, and when you push it down to 15%, then 10%, then 5%… at some point the benefits begin to outweigh the costs. By faster and more accurate analysis of the people’s input on the drugs they use, we hope to eventually help pharmaceutical companies to reduce side effects of the drugs they produce. This is yet another example of how neural networks can be changing our lives to the better, and we are happy to be part of this process.

Elena Tutubalina
Researcher, Neuromation

Sergey Nikolenko
Chief Research Officer, Neuromation

--

--