Predicting Heroin Risk (and Use) with Machine Learning

“Medicine is a science of uncertainty and an art of probability” — William Osler

Adam Gulamhusein
Analytics Vidhya
8 min readAug 13, 2020


Image from Google Images

What is Addiction?

Addiction is a disease of the brain that is characterized by the continuous actions of an individual to obtain something despite harmful consequences to them and other aspects of their life. There are a variety of addictions: food, gambling, sex, and of course, drugs. When someone thinks of addiction, their mind normally goes to drugs.

The overwhelming impact drug addiction has on an individual, their family, and ultimately the larger collective around them cannot be understated. $700 billion is spent annually in the USA towards addiction and related costs, and more than 100 people die every day from drug overdoses in the United States. The opioid crisis, specifically, has been a prominent issue due to its economic, social, and political impact.

A particular issue with that has continued to fuel this crisis is the overprescription of opioids by physicians begins to translate to addiction in many patients and is an issue directly contributing to the opioid crisis.

“Roughly 21 to 29 percent of patients prescribed opioids for chronic pain misuse them. Between 8 and 12 percent develop an opioid use disorder. An estimated 4 to 6 percent who misuse prescription opioids transition to heroin.” — National Institute on Drug Abuse

What are Opioids?

Opioids are substances that bind to opioid receptors in the brain. Normally, endorphins in our body, produced by the pituitary gland — which act as natural opioids — attach to these receptors. These substances typically have pain-relieving and/or euphoric effects. Humans have consumed opioids since prehistoric times by extracting opium (also known as morphine) from the juice of poppy flowers. Opioids include drugs like fentanyl, codeine, oxycodone, and heroin.

Diagram from Queensland Brain Institute illustrates a basic synapse in the brain.

When opioids attach to their respective receptors they promote potassium conductance which results in a lower probability of an action potential in postsynaptic neurons (due to the exchange of potassium and sodium in and out of a neuron during an action potential). Opioids also inhibit calcium conductance which limits the release of neurotransmitters from the presynaptic neuron.

The effects of these actions include:

  • Activation of opioid rectors in the spine that inhibit the transmission of pain which is felt in the Postcentral Gyrus (part of the parietal lobe that processes somatosensory information including pain) and other parts of the brain
  • Influence on pain modulator systems which are used to reduce pain: Binding to receptors in the Periaqueductal Gray (Brainstem) and prompt inhibition of pain signaling in the spinal cord
  • Reducing the emotional impact of pain by acting in brain regions like the anterior cingulate cortex
  • Increased dopamine signaling in different areas of the brain, but specifically the Nucleus Accumbens which has been contributed to the reinforcement qualities of these drugs

The dopamine release caused by opiates is thought to be connected to disinhibition — this is where GABAergic neurons, which are inhibitory, are inhibited — this limits the amounts of GABA released and increases the amount of dopamine released. This heightens feelings of pleasure and dopamine released.

Image from the CDC shows the different “waves” of opioid use and the reasons for the increase in cases during these waves.

The impact of opioids is phenomenally widespread and is exponentially growing.

Graph from the Social Capital Project illustrates the exponential growth of opioid use.

Current treatment for opioids typically involves medication and therapy.

  • Naloxone which is usually used to reverse overdoses can also be used for preventative purposes to stop cravings
  • Methadone and Buprenorphine improve withdrawal symptoms even though they stimulate opioid receptors, and produce a limited high. Researchers have found that these therapies can help deter a person from seeking heroin or other abused opioids
  • Psychosocial therapy, including cognitive-behavioral therapy and behavioral change, focused on positive reinforcement, can also be combined with drug treatments to treat opioid addiction

The constant growth of opioid use across the United States and an overall increase in drug addiction in many different countries requires change. One area where this change needs to occur is the overprescription of opioids.

My Solution

The current issue of the overprescription of opioids, has lead to addiction in thousands of patients. Different patients will react to different treatments for a variety of reasons including genetics, and circumstance. However, getting access to genetic information for addicts appears to be restricted to the public (which sucks, but is unsurprising) and the application of genetic screening for every potential patient may not be the most practical choice for many physicians/hospitals. The issues around circumstance may become biased due to the patients surveyed. Those which are being surveyed would have to be somewhere (i.e. hospital, rehab center, etc) and these individuals may have different circumstances than others who have an addiction who do not end up receiving the treatment.

I choose analysis of personality traits (and demographic) instead, which had data more accessible to the public, and has the potential to be more applicable to current practices since the data gathering process includes simple surveys for patients — which is ultimately less time consuming and expensive.

Image of Big Five personality traits that have been used to predict education and health.

Features included in this analysis include:

  • Age
  • Sex
  • Education
  • Country
  • Ethnicity
  • Neuroticism
  • Agreeableness
  • Openness
  • Extroversion
  • Conscientiousness
  • Impulsiveness measured by BIS-11
  • SS which is sensation measured by ImpSS

These features were used in a Machine Learning model and used to predict substance abuse for many drugs (I choose to focus on Heroin). There were multiple possible outcomes for substance abuse.

  • CL0: Substance has never been used
  • CL1: Used over a decade ago
  • CL2: Used in the last decade
  • CL3: Used in the last year
  • CL4: Used in the last month
  • CL5: Used in the last week
  • CL6: Used in the last day

The model chosen was a Logistic Regression model. A K-Nearest Neighbour, Decision Tree, and Support Vector Machine models were also tested, but the Logistic Regression model proved to be the optimal choice when the percentage of accurate classifications was measured from the testing set.

The analysis of the Big Five personality traits by the original owners of the database is also a pragmatic choice since it’s such a prominent method in the psychology of analyzing personality. There are also many online accessible surveys that would allow patients to get a general idea of where they fall (ideally, more sophisticated methods should be used in a clinical setting). This is something I did for fun to predict the probability of myself using heroin and other drugs as well.

What is Logistic Regression?

Logistic regression is a supervised machine learning model that takes labeled data for training. Logistic regression is an algorithm used for classification by predicting various binary response variables that may indicate the presence or absence of some state.

Diagram of basic Logistic Regression model with a sigmoid curve plotted.

Logistic regression utilizes a sigmoid function which is a type of mathematical function that takes any real value and maps it to values between 0 and 1. The sigmoid function creates an “S” curve that normally has a threshold at 0.5. If the value which is given falls below the threshold then Class 0 is predicted and when the value given is ≥ 50% then Class 1 is predicted.

Logistic regression is a common model used for classification problems. One particular benefit to using this method is predicting the probability of being in a specific class. Since inputs are mapped to the sigmoid curve, the probability of being in one class can be found by the placement on this curve.

The implementation of machine learning can sometimes be tedious, but there are machine learning libraries that can be imported with Python (which is what I did) and this made this helped streamline this process.

Breakdown of the Code and Results

Step 1: Importing necessary libraries

  • Matplotlib was used to plot data
  • Sklearn was used for machine learning implementation
  • Numpy was used for data manipulation
  • Pandas were used for data extraction
  • Seaborn and Pylab were also used for plotting data
  • Random was used for the random selection of patient data

Normally, StandardScalar would also be imported to normalize data, but the data was already normalized when downloaded.

Libraries imported for this project that includes different sklearn modules which makes machine learning implementation much more efficient.

Step 2: Loading and Creating Machine Learning model

  • Data was imported from Excel file
  • Data was explored/Features identified
  • Features and labels were initialized
  • Data was split into training features/labels, and testing features/labels
  • The model was initialized with the max number of iterations set at 1000
  • Training features and labels were fit to the model
The ML model was initialized and fit with data that was split after it was imported from an excel file.

Step 3: Scores of different drugs fit with the model

The rarer the drug, the easier it is for the model to determine use based on personality. Alcohol, nicotine, and cannabis are used more frequently by far more people which makes it harder to predict usage based on personality.

*Semer (Semeron) is a fake drug that is used to identify over-claimers.

Scores of model for predicting outcomes with the ML model.

Step 4: Testing randomly generated patients

Going through imported data and randomly selecting values for each feature for 100 different patients.

The score for this model was ~82% and predicted 98 patients to have never used Heroin, 1 to have used it a decade ago, and 1 to have used it within the decade. My data is underneath which predicts my own outcome and probabilities for each of the classes.

According to, approximately 4.98 million people in the US have tried Heroin at least once during their lifetime. This is equivalent to ~1.8% of the total population and matches what was found by the algorithm.

The values underneath which include a variety of percentages is this model predicting the probability of me being in any particular class. There is an 86.2% chance of me never having used heroin and a 0.9% chance of be using within the past day.

Step 5: Plotting correlation matrix

Plotting a correlation matrix that finds the weight of each feature on outcome prediction.
The outcome column is the furthest on the right and the different “amount” of correlation for features shows their importance. A value of 1 represents an absolute correlation.

Why does this matter?

My journey into artificial intelligence started a couple of years ago, but I didn’t learn how to apply what I was learning until recently. That coupled with the fact I made this between 12 AM and 5 AM definitely leaves room for improvement. However, the model had between 81%-89% accuracy for Heroin use classification. Not bad.

Models like this have the ability to change prescription policies and tendencies of physicians. Those who are more susceptible should be considered more for alternative treatments. Those who are less susceptible should still be considered for alternative treatments. Perhaps prioritization (if there needs to be) should go to patients more likely to develop an addiction.

Treatment for addiction is often generalized. American Addiction Centers puts treatment dropout rates at 70%-80%. Personalized treatment and prevention offers a chance to slow the growth of the opioid crisis and revolutionize treatment for addiction.



Adam Gulamhusein
Analytics Vidhya

TEDx Speaker | HYRS Alum (Neurosurgical RA) | TKS Student | SHAD Alum | 2021 Calgary Brain Bee Winner