Predicting Adverse Drug Reactions with Drug-Specific Cohorts

Chiao-Feng Lin
DNAnexus Science Frontiers
7 min readNov 10, 2021

By Vikrant Magadi

Advancements in pharmacogenomics (PGx) have led to the establishment of gene-drug specific dosing guidelines and the reduction of many adverse drug reactions, or ADRs. But, as my DNAnexus mentor, Chiao-Feng Lin, explained previously, dosing guidelines are only available for 112 drugs — a small fraction of the over 20,000 prescription drug products approved for marketing. Furthermore, while some ADRs are predictable from the drug’s known pharmacology, others are not (idiosyncratic ADRs).

In her research, Chiao has worked to predict whether some individuals are more susceptible to idiosyncratic ADRs using data from the UK Biobank and LightGBM, a decision-tree based machine learning framework. My task over the summer was to split the sample set she used into drug-specific cohorts, so that individuals who had adverse reactions to certain types of drugs would be considered together. Looking at drug-specific cohorts also allowed for separate parameter tuning of the LightGBM model, and for separate interpretation of the feature importance. I also incorporated different types of genomic features, such as Polygenic Score (PGS) for various diseases, and rare variant burden for Absorption, Distribution, Metabolism, and Excretion (ADME) genes, as well as non-genomic factors collected from lab test results and questionnaires.

Assembling drug-specific cohorts

In order to assemble cohorts based on categories of drugs, I used the British National Formulary (BNF) drug coding format. BNF codes are highly structured and hierarchical, meaning that I could group drugs together in specific or broad categories. I chose to group drugs together by BNF Paragraph, a fairly broad group (e.g. Antidepressants, Non-Opioid Analgesics) that still provides a clear distinction between categories. Once I had assembled a list of drug codes, I queried the UKB database through the DNAnexus UK Biobank Research Analysis Platform (RAP) to compile a cohort of individuals that had been treated with drugs in each category.

Running LightGBM and findings

I ran LightGBM on each drug cohort, using slightly different parameters on each one to improve performance of the model. Despite the separate cohorts and parameters, I observed surprisingly consistent results across cohorts, both in the performance of the model and the resulting feature importance.

Across all five drug categories, including antidepressants, non-opioid analgesics, etc., I observed that non-genomic features consistently showed stronger performance than genomic features. When only considering genotypic features, Area Under Curve (AUC) was usually 0.50 or below, indicating that the classification was equal to or worse than random guessing. When considering non-genomic features, I observed an AUC around 0.74. Within non-genomic features, SHAP analysis showed that features relating to psychiatric issues (such as seeing a psychiatrist for anxiety, loneliness, and frequency of unenthusiasm/disinterest) were among the most prominent features as they had the strongest impact on the model outcome.

ROC curve shows Area Under Curve (AUC) = 0.75 for binary classification on non-opioid analgesics cohort
Figure 1: AUC for binary classification on non-opioid analgesics cohort

Figure 2: SHAP summary plot for non-opioid analgesics

SHAP summary plot for binary classification on non-opioid analgesics
Figure 2: SHAP summary plot for binary classification on non-opioid analgesics

HTTPS App for Querying GWAS Results

As another part of my project, I set out to build a HTTPS app to query and visualize GWAS results. Genome-wide association study (GWAS) is a type of genetic study that observes a large set of variants across the entire genome and calculates their correlation with a trait of interest. Regenie is a tool for conducting GWAS that can examine multiple traits simultaneously. For my project, I worked with GWAS results that had been run on Regenie and annotated using OpenCravat, a tool for variant interpretation and annotation. The annotated results were outputted to a SQLite database to allow for easier querying. My project was to build a web app that could query this database and display results as tables or graphs. After building this app locally (on my laptop) using Flask, I adapted it as an HTTPS app on the UK Biobank RAP.

Functionality and use cases

My vision for the use case of this app is that a user who has conducted GWAS can examine their results in more detail through a graphical user interface, rather than needing to manually parse a large text file or query a database. The app assumes that the user has uploaded their GWAS results in a SQLite database to the DNAnexus UK Biobank RAP. As inputs, the user must provide the database ID of the database, as well as the file ID of a supplementary text file. Once the database has been loaded onto the app instance, the user sees a search bar, where they can enter a gene name, variant, or trait (such as an ICD10 code).

Figure 3: Home page where users can submit queries
Figure 3: Home page where users can submit queries

After the user submits the search term, they are shown a results page with a table containing all results for their query, including information about the location of variants, the gene, and the p-value for an associated trait. The table is sortable and searchable, and the hyperlinks in the “Trait” and “Gene” columns lead to online databases with more information about the ICD10 code or gene of interest.

Figure 4: Search results page with hyperlinks to relevant entries of online databases
Figure 4: Search results page with hyperlinks to relevant entries of online databases

Users can also see their results displayed graphically using Bokeh, a Python library which creates interactive graphs on web pages. The user can either view bar graphs corresponding to the variants associated with each gene, or a Manhattan style scatter plot of all variants associated with a certain trait.

Figure 5: Manhattan style scatter plot with interactive functions provided by Bokeh Python library
Figure 5: Manhattan style scatter plot with interactive functions provided by Bokeh Python library

Design and tools

The main tool I used for creating the app was Flask, a Python web framework. Flask was a useful tool because of its combination of simplicity and functionality. The main web app is created using relatively simple Python code, but Flask can also be combined with a variety of external tools and libraries. Of particular use to me was Flask’s support for HTML templates and SQLite queries. I was able to create the layout of the application in HTML and style it with CSS. Using Flask’s jinja2 template engine, I could also pass in variables and write blocks of Python code directly in the HTML files. Flask could also handle POST and GET requests for processing user queries. The backend of the application was implemented using Flask’s SQLite functions, which allowed me to pass SQL queries as arguments in my main application.

I also used a few tools which are not directly supported by Flask but added useful functionality. As mentioned previously, I used Bokeh to create interactive plots within the user’s browser. Conveniently, Bokeh can output its plots to an HTML file, so by placing this file within the “templates” subdirectory, I could get Flask to render the plot as an HTML template. Flask doesn’t have much built-in support for client-side functionality, so in order to make the results tables sortable and searchable, I used DataTables, a plug-in for the jQuery JavaScript library.

Cloud deployment

To deploy my app to the UK Biobank RAP, I used DNAnexus’ standard process for building custom apps. First, I packaged my application in an executable form so it could be run with a shell script. To accomplish this, I used Docker, a containerization tool that can bundle together software in packages called “images.” These images can be run using a single command, which creates a “container” where all processes are run. Docker was particularly useful for me because it allowed me to package all of the dependencies I needed directly into the image, avoiding having to download any libraries to the app instance at runtime.

After packaging my application into a Docker image, I used the DNAnexus SDK to build the app onto the platform. This required creating a JSON metadata file that specified the inputs and outputs of the app, runtime specifications, and the ports that the web server would listen on. The metadata file passes inputs to a shell script which runs the Docker image and initializes the web server. In order for the DNAnexus app instance to be able to display the web app process from the Docker container, I mapped the port that Flask created inside the Docker container to the external port of the app instance.

This research was performed by Vikrant Magadi, a sophomore in Molecular Engineering at the University of Chicago, as part of his internship with DNAnexus. The project was supervised by Chiao-Feng Lin.

Acknowledgements

Special thanks to my mentor, Chiao-Feng Lin, for hours of explanation and encouragement throughout the summer. Thanks to our Research Lab Team Director, Jason Chin, for introducing me to the GWAS web app project and for patiently explaining all the intricacies of the various tools I used. Thanks to the rest of the Research Lab Team: Chai Fungtammasan, Yih-Chii Hwang, Daniel Quang, and Peter Nguyen for all of their scientific, technical, and personal advice.

Research on the ADR project was conducted using the UK Biobank Resource under application number 46926.

--

--