Artificial Intelligence Can Now Use Instagram to Predict Substance Abuse Risk

A paper published recently developed a method of predicting someone’s risk of substance abuse based on their Instagram accounts (link). Let’s take a look at how they did it.

Jerry Wei
Health Data Science
4 min readJun 25, 2019

--

Photo by Alessandro Zambon on Unsplash

Substance Abuse. The authors of the paper recognized the pressing issue of substance abuse — it is an issue around the world that affects people regardless of socioeconomic status. Here’s a quote from the paper:

“Altogether with lifestyle choices and metabolic risk factors, the use of alcohol, tobacco, and drugs are among the top ten causes of preventable deaths in the United States. The misuse of prescription and illicit drugs causes over 100 deaths from overdose alone each day, dethroning motor vehicle accidents as the leading cause of injury deaths.”

The current opioid epidemic in the United States has only made the problem worse, and the authors hoped to employ modern technological innovations to address the problem.

Instagram Data. The paper chose to use Instagram because of its popularity among young users and its rapid growth. The authors note that the images and text that users post can be used to get an idea of each user’s personality. To capitalize on this feature of social media, the authors crowdsourced thousands of participants (with their consent, of course) to share their Instagram information and participate in the study. Specifically, they looked at each user’s posted photos, comments, and captions (interestingly, the average number of posts per user was 183.5 posts). The authors used NIDA Modified ASSIST to assign “high-risk” and “low-risk” labels to participants. Let’s look at some extreme examples so we can get a better understanding of what kinds of things artificial intelligence would be looking for.

  • If a user posts a picture of themselves drinking alcohol, the machine learning model would probably mark that as high-risk for substance abuse.
  • If a user comments and says “I hate drugs”, the machine learning model would probably mark that as low-risk for substance abuse.
  • If a user posts an image with the caption “Currently abusing substances”, the machine learning model would probably mark that as high-risk for substance abuse. (OK unrealistic but you get the point)
Substance use risk distribution among Instagram users in the dataset. Image credits to the authors of the original paper.

Machine Learning Model. Since image and text have different data structures, the authors used multiple different machine learning architectures. For analyzing the images that users posted, they used a ResNet-18 model that was pre-trained on the ImageNet dataset. They fine-tuned the model on their own Instagram dataset to maximize accuracy. For analyzing the text (captions and comments), the authors first used Word2Vec on their data, a method of changing words into vector representations, and then fed the resulting vectors into a Long Short-Term Memory (LSTM) network. They then used a final layer to combine the results from the image and text networks to get a final prediction.

The architecture that was used — CNNs extract features from images and LSTMs extract features from text. A final layer uses those features to get a final prediction. Image credits to the authors of the original paper.

Results. The authors found that their AI could identify alcohol risk based on Instagram content significantly better than what is expected due to random chance. They got an F1 score (a standard measurement of accuracy for this type of problem) of 72.4%, compared to a guessing F1 score of 50%. However, the results were not as promising when the AI was trying to identify risk for tobacco, prescription drugs, and illegal drugs; the AI was not able to achieve F1 scores above 40% for those classes.

Conclusion. The paper was the first to show that machine learning could be used to identify potential substance abuse behavior using social media. The authors note that their method is completely automatic, an improvement on previous research that required manual analysis to predict substance abuse risk. They attribute the model’s inaccuracies on classes outside of alcohol to an imbalanced dataset (there were relatively few people that were high-risk for those classes).

I’ve listed some resources below that may be of interest.

--

--

Health Data Science
Health Data Science

Published in Health Data Science

Precision Medicine through Data Science and Machine Learning

Jerry Wei
Jerry Wei

Written by Jerry Wei

Large language models, AI for health. Researcher at Anthropic.