Photo by CDC on Unsplash

Using LLMs for drug repurposing

Ranjani Ramamurthy
llmed.ai
Published in
6 min readJul 13, 2023

--

Much gratitude to Dr. Soheil Meshinchi and Amanda Leonti.

This is the fourth in my series of articles of using LLMs in medicine. This time, it’s about using LLMs for drug repurposing. I am sharing my perspective based on experience with a research application developed and used in the lab of Dr. Soheil Meshinchi to find drugs that could be repurposed for pediatric AML patients. I know that a re-design of this application using LLMs can improve it significantly and broaden its impact.

Pediatric AML is a rare and heterogenous disease. It is very different from adult AML and has limited therapeutic options. However, the collection and analysis of comprehensive genomics datasets have shown promise in identifying biomarkers and targets. If these targets can be matched with therapies developed for other indications (aka ‘drug repurposing’ ), it offers the potential for accelerated treatment options for pediatric AML.

Over the last five years, the Meshinchi lab at Fred Hutchinson Cancer Center has developed a purpose-built analytics application (“AML-CT-Search”) to find suitable drugs for repurposing. In this article, I discuss how modern LLM technologies can make this process faster, more accurate, scalable and expand its scope.

The Drug Repurposing Research Pipeline:

Drug repurposing refers to finding commercially available drugs or drugs that are in development (in clinical trials) and using them for new indications. With pediatric AML, patients with rarer disease sub-types often exhaust their known treatment options. At such a time, a viable approach is to try drugs in development on “compassionate use” provisions.

Over the last decade, the Meshinchi lab has collected and curated a deep genomics dataset of pediatric AML with matching de-identified longitudinal clinical data. The drug repurposing research pipeline uses this data alongside publicly available datasets to identify suitable therapeutic candiates to advise the lab’s research roadmap.

There are three steps in the drug repurposing research pipeline.

Step 1: Bioinformatics techniques are used to discover novel biomarkers as well as targets specific to sub-groups of pediatric AML.

Step 2: AML-CT-Search takes these biomarkers and targets as input and matches them to clinical trials of drugs in development, where the targets are identical but have been developed for other indications. That is, this step identifies the drugs that might be repurposed for pediatric AML.

Step 3: Therapies selected from Step 2 are then analyzed for applicability/likelihood of success in pediatric AML. A prioritized list of therapies then helps guide the experimental and clinical efforts.

The rest of this article is about Step 2, AML-CT-Search.

AML-CT-Search:

AML-CT-Search is built with basic text search methods.

At a high level, the task is simple. Given a set of biomarkers/targets, it finds clinical trials that are testing drugs for the same targets but agnostic of disease indication. However, when the application was built, a lot of time was spent on manually curating gene/target lists as well as building regular expressions to search over clinical trial descriptions. Results were also validated manually.

The implementation approach was guided by the maturity of the systems that the analytics team was working with, and the knowledge that the clinical trial information in clinicaltrials.gov is not standardized and would have missing information and errors.

Match AML Targets to Oncology specific clinical trials testing targeted therapies

AML-CT-Search uses HGNC to create a master-list of genes and targets and their synonyms. Regular expressions representing this master-list are then used to query over a subset of clinical trials.

The clinical trials database does not require trial sponsors to provide structured fields containing genes or proteins targeted by a therapy under investigation which is why mining it requires search over the free-text description of trials.

AML-CT-Search works with a snapshot of the publicly available AACT database, which contains both protocols and results from studies registered in clinicaltrials.gov. Using the AML targets of interest, oncology specific clinical trials are matched for potential drugs for repurposing.

The output from AML-CT-Search is manually validated with other known data sources (for e.g. ADC Review, KEGG Drug, DGIdb, OpenTargets, etc.)

How can LLMs improve AML-CT-Search:

My experiments using LLMs for information extraction and clinical NLP suggest that we can re-build AML-CT-Search with LLM technologies.

LLMs can be used to structure and normalize information about all clinical trials (from clinicaltrials.gov, EudraCT, ICTRP, etc.). Once that information has been structured, gaps in information can be filled in from other sources — like mining literature, sponsor web sites, and publicly available datasets. Validation of data elements can also be automated.

This will make the application more accurate and scalable and more broadly applicable for research.

Here is an example screenshot from the OpenAI playground. The input is a free-text description of a single clinical trial from clinicaltrials.gov.

Prompt:

“Create a table summarizing clinical trials information. If you are unsure about any of the fields, output N/A”

<.. description from clinicaltrials.gov ..>

Drug | Phase | Status | Sponsor | Gene Target | Gene Synonym | Payload

Output:

STRO-002 | 1 | Open-label | N/A| Folate Receptor Alpha |FolRα |N/A

Screenshot of Try #1

On the first run, you can see that the “Sponsor” and the “Payload” information is empty (“N/A”).

In the next iteration, I added some information from the literature about STRO-002. The prompt was identical.

Output:

STRO-002 | 1 | Open-label | Sutro Biopharma | FolRα | Folate Receptor Alpha | 3-aminophenyl hemiasterlin (SC209)

Screenshot of Try#2

Both “Sponsor” and “Payload” information is retrieved.

Implications beyond this specific application:

What excites me about this approach is the potential for a huge number of meaningful applications that this can enable. It will be immensely useful to structure and normalize all information available in clinical trials registries. Filling data gaps, addressing data quality and automating validation will also help. But what it can also do, is that a broader range of people can then use a general purpose LLM search engine for broader aims.

For example, one can ask queries that combine data from different data sources (trial details, sponsor, payload).

“Find clinical trials targeting FOLR-1 with payload <> and targeting Ovarian Cancer.”

Now, if we’re being really ambitious, we’d want to answer complex and realistic queries like this one below

“My patient is a 55 year old female with triple-negative breast cancer, stage 4, and a history of <> & <>. Find me a clinical trial where the protocol does not include any drug that has a side effect of <> and no patients have exited the trial due due to complications attributed to liver inflammation.”

In my next article, I’ll explore how far we can go with today’s technology and what the limitations are at the moment.

Thanks for reading!

--

--

Ranjani Ramamurthy
llmed.ai

Product Management, MD, Cancer Research, Engineer, Health-Tech advisor, GH Labs, ICGA, Fred-Hutch, LLS, ex-Microsoft, pediatric cancer research advocate.