2. Basic keywords aren’t enough. Now what?

How to measure and improve indicators while parsing clinical trial text

Published in

Clinical Trial NLP Challenge

4 min readApr 28, 2018

In our last post, we talked about using a rule-based, grammatical approach to extract important information from clinical trial descriptions. We posited that if we could parse sentences into their composing parts of speech, we could extract information based on predictable relationships, such as those between subjects and objects. Since we’ve already compiled an initial list of “indicator” words/phrases, and programmed tools to extract their containing sentence, the next task to fit this method into the tools we’ve already built is tagging each indicator with the grammatical relationship which should point to the target information.

As we’ve been working on defining these grammatical relationships of interest, we’ve also been evaluating our list of indicators and experimenting with ways to improve them. Here’s a look at some strategies on the table for how we might achieve this:

Better Indicators: Precision

In the last post, we talked about testing the power of each indicator by checking how frequently each occurred in our sample of 270k trial descriptions. The next step after selecting the indicators with non-negligible rates of occurrence was to check each indicator’s precision.

A drawback inherent to the indicator-based approach is that each indicator must be evaluated individually by our team. To make this phase more manageable, we chose to narrow the scope to two categories of information identified as most important for patients:

Patient burden: this includes the timing and duration of a patient’s involvement in the study
Intervention: this includes what type of treatment or observation patients will undergo

Once we’d made this refinement, we created a spreadsheet listing indicators alongside a random sample of the sentences in which they occurred in our trial description data. Our partners at UCSF then annotated each sentence with a “yes” or “no” to indicate whether the indicator truly captured the information we hoped it would. In the example below, our indicator which uses regular expressions to capture any text with the format “# years” is shown to be a reasonably precise indicator of patient burden.

sample rows of our precision-check spreadsheet

This exercise helped us to identify edge cases which we did not anticipate in constructing our list of indicators.

Notice the sentence in which the indicator above failed to successfully identify some aspect of patient burden:

“Myeloablative conditioning (adult patients 18–40 years old): Patients receive fludarabine phosphate IV over 30 minutes on days -6 to -4, cyclophosphamide IV over 1 hour on days -5 and -4, and undergo total-body irradiation on days -3 to -1.”

In this case, “# years” indicates eligibility criteria rather than burden. So how do we eliminate this problem from our system?

Solution 1: Co-Indicators

One approach to this problem is the use of co-indicators. Co-indicators are other indicator words that occur near the primary indicator. In some cases, these can strengthen the precision of our tool. Take, for example, the sentence:

“ However, Suzuki et al found 54% of 92 patients undergoing video assisted thoracoscopic excision of subcentimetre nodules, required conversion to a thoracotomy.”

Our current system extracted this sentence based on the indicator of study size, “# patients”. However, in this context this phrase is merely a description of past studies. Possible co-indicators such as “arm”, “group”, or “randomized”, could narrow the pool of extracted sentences, eliminating false positives such as this one.

Solution 2: Grammatical Context

Since we’d already implemented a sentence parsing system using Spacy, another answer to this problem we’re trying is further refining our indicators by specifying the grammatical function of its containing phrase.

In the edge case seen above, “# years” is contained in a adjectival phrase: “adult patients 18–40 years old”. If we can separate occurrences of “# years” within an adjectival phrase as indicators of eligibility criteria, and use all other occurrences of “# years” as indicators of burden, we hope to improve the precision of this indicators. A similar method of analysis can be applied to the other indicators on our list.

Better Indicators: Recall

We’ve also been discussing how we can improve the recall of our indicators. We don’t just want all of the sentences our indicators catch to contain the information we’re targeting — we also want our indicators to catch all the sentences that contain that information.

Some of our indicators fall within categories. For example, we identified the modifier “Vietnamese” as an indicator of eligibility criteria. However, any word which denotes an ethnic identity could be used similarly. How do we capture all of them?

One solution we’re exploring is to construct a gazetteer of such categories. A gazetteer is simply a list of all the words in a category. The gazetteer of weekdays, for instance would be: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday.

It’s true, of course, that many of the categories for which we’d be constructing gazetteers are not quite as straightforward as this example (ethnic identities, for one, can be politically loaded and tend to shift with time). However, we’ve considered creating a best approximation for such categories by scraping online sources of encyclopedic information like Wikipedia.

What’s next

In our next post we’ll detail the work we’ve done using grammar-based rules to identify the target information signaled by our indicator words. Stay tuned!