EDA on Medicare spending in the United States using SQL and Power BI

Shruti Patkar
Learning SQL
Published in
5 min readDec 15, 2022

Hello Everyone,

In this article, we will do EDA on a dataset related to healthcare.

Before beginning with the topic let’s quickly understand the meaning of EDA. So, EDA stands for Exploratory Data Analysis which is an approach of analyzing datasets to summarize their main characteristics often using statistical graphics and data visualization methods.

I have selected MySQL for data analysis and Power BI for data visualization in this project.

To begin with the data analysis let us have a better understanding of the topic first.

Introduction

Photo on HealthITAnalytics

The healthcare dataset which I have selected for EDA is on ‘Medicare spending per beneficiary/patient’ in the United States. It is basically a quality measure, calculated and published by CMS (Centers for Medicare & Medicaid Services), which is a federal agency within the United States Department of Health and Human Services. More information and the dataset are publicly available on the official website of CMS. The link to the dataset is given below:

https://data.cms.gov/provider-data/dataset/rs6n-9qwg

Now let’s understand the meaning of the MSPB- Medicare Spending Per Beneficiary:

• It is a measure that shows whether Medicare spends more, less, or about the same for an episode of patient care at a specific hospital compared to all hospitals nationally

• It is the ratio of the amount Medicare spent per patient for an episode of patient care at the hospital divided by the median amount Medicare spent across all hospitals nationally

What does the ratio indicate?

· MSPB ratio less than 1: Hospital spends less than the average amount across all inpatient hospitals nationally

· MSPB ratio equal to 1: Hospital spends about the same as the average amount across all inpatient hospitals nationally

· MSPB ratio greater than 1: Hospital spends greater than the average amount across all inpatient hospitals nationally

Advantages of MSPB measure:

· To evaluate the hospital’s efficiency relative to the efficiency of the national median hospital

· To reward hospitals that can provide efficient care at a lower cost

· To understand and evaluate the states based on the Medicare spending

· To encourage hospitals to provide Medicare services at a lower cost.

Data Analysis:

Step 1: Understand the dataset

To view the dataset in MySQL Workbench, we will run the following query

Visual from the dataset. Image by author

We can see here, that there are total 12 columns in the dataset as follows:

Facility ID, Hospital, Address, City, State, ZIP Code, County, Phone Number, Score, Footnote, Start Date, and End Date.

As we are familiar with the dataset and the column names, we will now proceed further to the next step.

Step 2: Look for blank values

We will run the following query to check if there are any blank values in the dataset

Image by author

In the above query, I have selected the Hospital column to check for blank values, to check null values in another column we just need to mention that column’s name in the WHERE clause.

After running the above query, we can see that there are no blank values in the dataset. So finally, we can proceed with our EDA

Step 3: Exploratory Data Analysis

As we have seen in the Introduction, the dataset talks about the hospitals in the United States with respect to their MSPB ratio. In step 1 we understood that there are few hospitals whose MSPB score is not available, so to avoid any discrepancies and errors in our analysis we will only select those hospitals whose score is given in the dataset.

Now we will ask questions to our dataset for exploring the different insights.

Q1: How many hospitals and states are covered?

No of hospitals & states. Image by author

Q2: How many cities and counties are covered?

No of cities & county. Image by author

Q3: How many hospitals from each state?

No of hospitals from each state. Image by author

Q4: What is the highest MSPB ratio score?

Highest MSPB ratio score. Image by author

Q5: List of top 10 hospitals having the highest MSPB ratio score.

Hospitals having highest MSPB ratio score. Image by author

Q6: What is the lowest MSPB ratio score?

Lowest MSPB ratio score. Image by author

Q7: List of bottom 10 hospitals having the lowest MSPB ratio score

Hospitals having lowest MSPB ratio score. Image by author

Q8: How many hospitals are having MSPB ratio score of ‘1’?

No of hospitals- MSPB ratio: 1. Image by author

Q9: How many hospitals are having MSPB ratio score ‘greater than 1’?

No of hospitals- MSPB ratio ‘greater than 1’. Image by author

Q10: How many hospitals are having MSPB ratio score ‘smaller than 1’?

No of hospitals- MSPB ratio ‘smaller than 1’. Image by author

These are the common questions that serve the aim of our EDA, apart from these one can also ask other questions and frame SQL queries accordingly. Before ending the article, I would like to share the screenshot of the dashboard which I have created in Power BI.

Power BI Dashboard. Image by author

The complete project is available on my GitHub profile, link is given below:

https://github.com/shrutipatkar25/EDA-using-SQL-and-Power-BI

Hope you enjoyed the entire journey of EDA!

As this was my first article any suggestions and recommendations are always welcome:)

Thank You.

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--

Shruti Patkar
Learning SQL

Data Enthusiast in Healthcare | SQL | Excel | Power BI | R