Recognizing drugs on medical prescriptions

Published in

Synapse Medicine

10 min readSep 28, 2018

In this post we describe the Optical Drug Recognition pipeline we developped at Synapse Medicine to automate prescriptions analysis and help doctors.

by Bastien Chevallier, Alicia Bel-Létoile, Julien Jouganous from Synapse Medicine

What was the purpose of this project?

As part of my Master in Technology Venture at Ecole Polytechnique, I joined Synapse Medicine during a 4-months internship. Synapse Medicine is a French startup that aims to easily provide clear and relevant information about drugs to physicians. At the core of this project stands the time-consuming and tedious task of searching information about drugs among the abundant and independent databases currently existing. Considering this problem, one needs to notice that in France the large majority of prescriptions are still issued on printed paper, and patients often carry these to their medical consultations. Therefore, a common task for French physicians is to gather these printed prescriptions, analyze them and look for potential drug interactions. This task is painful and time-consuming, time which can be rare for example in an emergency department.

In this context, I worked on an Optical Drug Recognition (ODR) system. Integrated to Synapse’s smartphone app, this ODR would make it possible for physicians to instantly analyze all drug interactions in a printed medical prescription. Therefore, the purpose was less to reach the state of the art performance of Optical Character Recognition (OCR) than to design a whole pipeline able to recognize specific drug names. We wanted to identify, with high confidence, all drugs’ names on photographed printed-prescription.

With such a device in their gown, doctors would be able to prevent more readily drug interactions and drug-related complications, thanks to a faster and more efficient analysis.

Why not use an existing OCR SDK?

To do so, we developed our own OCR network, which represents only a piece of the whole pipeline. The question that first appears to mind is why not use an existing OCR SDK as it was not a fundamental part of our pipeline. Actually, we took this decision for three reasons.

The first one takes its roots in the need for strong private data protection. Sensitive medical data have always been considered with exceptional care, but nowadays, after several huge leaks, this precaution is applied to every single data and becomes central among users’ expectations. At Synapse Medicine, we gather our efforts to guaranty to our consumers that their data, and in the physicians’ case, their patients’ data, shall always be protected and controlled. With our own OCR, pictures of prescriptions would never transit on a foreign server on which Synapse has no control.

Second, commercial OCR SDKs are trained to recognize text on many different kinds of images in everyday’s life. For its web app, Synapse’s OCR needs to be extremely efficient on prescriptions’ pictures taken with a smartphone camera, pictures which can be curved or low-lit, with shadows etc. Using a model highly trained on this specific usage seemed more interesting and relevant.

Finally, the last reason is economical. Indeed, this Synapse’s app feature is predicted to be called regularly for quick use. As commercial OCR SDKs charge their clients on the number of requests, developing our own OCR should save us a lot of money on a long term perspective.

What was our approach?

The following parts will focus on the milestones we had to go through to reach our goal: designing the world’s first ODR. Dropbox great post on their journey toward creating a mobile document scanner was of great inspiration for this work!
To overcome the lack of relevant datasets for our specific application and to bypass the time-consuming step of data annotation, we automatically generated a customized, randomized and realistic dataset.

We used this dataset to train a convolutional neural network, based on the reference model in computer vision, Inception v3.
We then built around the OCR the whole pipeline to design the entire ODR and tested it on real-life prescriptions.

Dataset and Image generation

Our first task was to find online datasets fitting our needs and train our OCR. Many text detection datasets are available online but most of them contain pictures of street signs. This kind of data was not relevant for our specific use which is detecting words on mobile phone pictures of printed documents. Moreover, very few datasets are available in French, our use case language. Consequently, we designed our own training dataset, with a particular care.

Medical vocabulary

The goal of our ODR being to detect drugs on prescriptions, we wanted to familiarize the OCR with drugs names. To do so, we used the French public drugs database as reference vocabulary. All medication names were used to generate the dataset.

Image generation

Our dataset was made of pictures, sized 150x150, containing one or two words randomly chosen in our reference vocabulary. The words’ background was lightly colored.

The main concern about this dataset generation was to create pictures looking as realistic as possible. This idea led to many specifications, such as background color, font style and colors, text rotations, shadows, blurs, etc…

In order to generate a realistic dataset, all these processes were applied randomly. Firstly, we focused our efforts on background and fonts colors. A sampling of RGB values on 50 physician-issued prescriptions was operated to determine the mean and covariance for background color and font color values. Using these results, font and background colors were randomly chosen using Gaussian distributions to generate new images. To increase the heterogeneity of the dataset, the program also randomly selected a font among a set of five.

Thus, we were able to generate text, with random font style and colors, on a random color background. Though, in order to complicate the dataset and approach the realism of pictures taken with a mobile phone, we applied additional transformations. Our images were randomly rotated, sheared, blurred or shadowed. This process gave us a clear, realistic and heterogeneous dataset, optimal to train our model.

Training pictures after different types of transformations

Dataset size and distribution

As a first trial, we generated 12 million pictures using this process. A number of empty images, which did not contain any text, were also inserted in the set. These images, though, could contain a colored background, or random shapes such as lines, barcode etc.

Model

At the core of the pipeline stands our OCR. It is composed of a convolutional neural network (CNN), followed by a bi-directional Long Short-Term Memory (LSTMs) network and a Connectionist Temporal Classification (CTC).

State of the art: Inception OCR

We developed the CNN using the well-known and widely approved Inception v3 architecture. Convolutional neural networks are largely reputed as being the state-of-the-art computer vision for different tasks. Using this architecture, we were able to use Transfer Learning techniques on our model to improve the learning process. By loading pre-trained Inception v3 weights, we considerably enhanced the performance of the network. Training the model from scratch would have probably led the model to diverge or to a lower accuracy.

LSTM & CTC

The convolutional network takes the dataset as inputs and returns visual features. A bidirectional LSTM network, usually used in speech recognition, follows the CNN. Fed with a French characters set, the LSTM network turns the visual features to common characters, including accented ones, as logits. Finally, the CTC uses these logits to predict the text present on the image.

Our model was built using Tensorflow, because this framework made deployment easier and was very convenient with Python. It was trained on our generated dataset until the loss stopped decreasing significantly.

This whole OCR pipeline is represented on the following picture:

Optimization

After designing and building the model successfully, we invested a lot of time and effort optimizing the hyperparameters. Multiple optimizers were tested, such as Adam, Stochastic Gradient Descent (SGD) with or without warm restart. A grid search was also conducted in order to find the best hyperparameters combination.

The global pipeline of the ODR

MSER

If you have correctly followed this post, you may have noticed that this pipeline takes as input the picture of a whole prescription, while the CNN was fed with single words 150x150 images.

In fact, before trying to predict the whole text, boxes containing words must be detected and isolated. Two approaches exist to do so: one using deep networks, usually used for object detection, and the other based on classic computer vision algorithms, called Maximally Stable External Regions (MSERs). The second one was favored to prevent complicating the whole pipeline with another deep-net and because MSERs allow more manual fine tuning.

A MSER detects connected components based on the RGB levels on the image’s pixels. To enhance its detection, some preprocessing is necessary, such as binarization and morphological transformation. Performed with a tuned structuring element, morphological transformation connects close similar components, making it easier for MSERs to embrace the whole group.

The unexpected and tricky part was to design a structuring element to connect letters of the same word without linking two close words. As the spaces separating letters and words differ on every picture, the structuring element must also be tuned for each image.

How were we able to get drug’s information from OCR predictions?

But as a whole, what distinguishes Synapse’s ODR from actual state of the art OCR ? Synapse has developed a knowledge graph gathering information available on drugs from different international certified sources. Following the OCR, we query this knowledge with the predicted text. As a result, it returns an entire entity containing information about the matched drug. Thus, the ODR does not return only raw text, like a regular OCR, but a structured object with additional information.

From these complex objects, the Synapse web app is for example able to highlight a potential drug interaction or a contraindication in the prescription.

Validation

Validation protocol and evaluation metrics

We tested our analysis pipeline on a set of 39 generated and printed prescriptions. For this validation-set, we randomly picked font styles, colors and documents layouts. To keep it simple, we only considered molecule names (and not drug brand names).

The pipeline was evaluated according to two classical metrics: precision and recall. Precision is a good proxy for the confidence we can have on the fact that entities detected by the ODR are actually present on the prescription. It is defined as the proportion of true positives (i.e molecule names detected by the ODR that atcually appear on the prescription) over the amount of positives (number of molecule names output by the ODR). Recall quantifies the pipeline’s ability to detect the entities on the prescription. It is given by the amount of true positives divided by the amount of targets (molecule names on the prescription to be detected).

To summarize:

where TP is the amount of true positives, FP is the amount of false positives and FN the amount of false negatives (molecule names present on the prescription but not detected by the pipeline).

Comparaison to state-of-the-art OCR : Google Vision

We needed a baseline to compare our pipeline with existing solutions. Therefore, we used a state-of-the-art OCR developed by Google and available through the Google Vision API and compared both pipeline on our set of generated prescriptions.

Figures below show the precision and recall distributions after the evaluation from both Google (orange bars) and Synapse (blue bars) pipelines. We see that, on this specific medical prescription analysis task, both pipelines provide comparably good performances.

Precision distribution among the set of prescriptions

Recall distribution among the set of prescriptions

When looking at the output of our OCR component, we noticed that generic text was not always well understood by the OCR, but it was very efficient for molecule names detection as it was trained for this specific task.

Future work

To conclude, we designed an end-to-end ODR pipeline using state-of-the-art deep learning models to read prescriptions and extract drugs names. It is a highly valuable feature for Synapse, greatly improving our product’s user experience, making prescription analysis as simple as taking a picture with a smartphone.

To go further in the evaluation process, we tested our ODR and Google Vision pipelines on a set of pictures of anonymized hospital prescriptions. These pictures were not taken with the intent to be analyzed by an automated system. Performance was not as good as those observed on the synthetic prescriptions for both pipelines due to pictures’ quality. However Google’s OCR seemed to generalize better than ours. It seems that our word detection process is still perfectible, but we are confident that further work on text-box detection accuracy will improve the overall pipeline reliability of our ODR.