Analyzing Prescription Rates in Medicaid Drug Utilization Data

Findings from an analysis of a large medical dataset with a focus on the number of prescriptions by state and year.

Dmitriy
More Python
4 min readFeb 7, 2019

--

Key insights:

  • Raw Medicaid data is a mess.
  • Graphs can be deceiving.
  • Hydrocodone is the number one most prescribed drug, ever.

I wanted to gain a better understanding of drug utilization rates in the U.S, so I analyzed prescription rates in a large subset of Medicaid drug data. To begin the analysis, I explored and visualized all of the available Medicaid data, from 1991 to 2017. I downloaded Drug Utilization Data for each year, picked out the features I wanted to visualize and combined everything into one data frame with over 75 million rows and 5 columns. This brings us to the first key insight.

Medicaid Data is Dirty

Medicaid data is really messy. Plotting the number of prescriptions by year and state showed that there was something clearly wrong in South Dakota in 2007.

Figure 1. Stacked bar chart of the number of prescriptions by state and year plus major outlier in 2007 for SD.

Diving deeper into the data, I discovered that South Dakota and many others had all types of ridiculous entries under the number of prescriptions. Repeated entries of 990,000 and numbers well above comparable states for the year told me that this was a data entry error. Since I have a lot of data and the purpose of this project is to get a general intuition for it, I found a good cut off point for dubious number of prescriptions and dropped everything that was above it. The plot looked much more accurate.

Figure 2. Stacked bar chart of the number of prescriptions by state and year.

The most populated states had the highest rates of prescriptions except Florida. Florida's rates were comparable to many less populated states like Ohio and Illinois. I immediately thought this was interesting. Florida has nearly double the population of Ohio, yet Ohio has more prescriptions. This brings us to our next key insight.

Graphs Can be Deceiving

Fig. 2 seems to show a low number of prescriptions in Florida relative to other highly populated states. To investigate whether this is true, I normalized the prescriptions to census data by year and state. We now get a completely different and more accurate picture of prescription rates.

Figure 3. Stacked bar chart of the number of prescriptions normalized to the population of each state and year.

The above figure enables us to see is that there are more inconsistencies in the data. Bars that don’t have a continuous spectrum of color are missing data for certain years. Nevertheless, fig. 3 is a good intuition for general rates of prescription per capita and perhaps even population health in general. The previous statement is somewhat accurate because Colorado and Utah consistently score highest in “healthiest states” while states at the opposite end of the graph consistently score highest in “unhealthiest states” in the Nation.

Visualizing the Top 100 Most Prescribed Drugs

The most compelling insight from this data which allowed me to answer my initial question from the beginning of this post (what drugs are people on) came from an interactive heatmap visualization I made using a tool called Bokeh in Python.

Figure 4. Interactive heatmap of the most prescribed drugs in the Nation by state and year.

The bright horizontal lines in the heatmap show the most prescribed drugs by state and year, while the bright vertical lines show states with the most drug prescriptions. For purposes of curiosity, I made a table of just the top 10.

Figure 5. Top 10 most prescribed drugs. The left column is drug name, right is the number of prescriptions.

The interactive heatmap is a great tool to ask a multitude of other questions about this dataset by just hovering over the brightly lit squares. For example, we can say, what’s up with that bright orange square in the top left? Why were there so many Promethazine prescriptions in California in 2015? Overall, it’s an outstanding tool and I highly recommend using it in your own projects and investigative research.

Code

--

--