SQL for Healthcare: Querying a Hospital Patients’ Database

Maria-Goretti Anike
5 min readNov 18, 2023

--

Image from Dribbble

Hey there everyone. It’s Maria, your favourite data explorer 😁.

So, sometime ago, I got to watch Data Wizardry’s YouTube video on SQL Basics with Healthcare Data. As a data analyst interested in healthcare, this inspired me to also query a hospital database to acquire key insights on the patients. As such, this article aims to answer questions concerning these patients with details provided in the database, while making use of beginner-friendly queries. As always, let’s get to it.

To start with, we’ll display all the tables that will be used for this task. The datasets used for this exercise were gotten from Synthea.

We have 9 tables — Patients, Encounters, Conditions, Immunizations, Procedures, Observations, Allergies, Careplans & Medications.

Now to start answering the questions.

  1. Write a query that does the following:
  • Lists out number of patients in descending order
  • Does not include Quincy
  • Must have at least 20 patients from that city

2. To find out the Top 10 patients with the most visits to the hospital. For this, I made use of the OFFSET and FETCH NEXT statements. If you’d like to learn how to retrieve these high values without using the LIMIT statement, check out this article.

3. To find out the number of times patients from each city visited the hospital.

4. To find out the cities with up to 50 counts of emergency cases.

As we can see, Boston is topping all these charts 😅.

5. Now to find out the number of patients from Boston who came in 2020.

That should be all for the cities. Let’s dive into the other tables.

6. To find out the top recurring conditions.

From a list of 129 different conditions, Viral sinusitis is the most recurring.

7. To find out the month with the highest number of ambulatory cases since 2010.

Since 2010, April has been the month with the most number of ambulatory cases, while February has the least.

8. Likewise for emergency cases since 2010.

Since 2010, December has been the month with the most number of emergency cases, while June has the least.

So far, we’ve made use of Joins. Time to make use of some Unions.

9. To check the number of diabetic patients by race, we’ll make use of unions.

10. Now to find out the Top 15 years with the most immunizations.

As we can see, 2018 is topping with an immunization count of 1492.

11. From this general count, we’ll like to know the number of patients that were administered the IPV (Inactivated Polio Vaccine) in 2020.

The number of IPV clients for 2020 is 11.

12. To find out the number of times each medical procedure was carried out in the hospital.

The most carried out procedure is the Medication Reconciliation.

As we know, most times in Healthcare Analytics, we make use of codes for conditions, e.g. in our dataset, ‘8867–4’ is the code for ‘Heart Rate’, ‘8480–6’ for ‘Systolic Blood Pressure’, and ‘18262–6’ for ‘Low Density Lipoprotein Cholesterol’. So, we’ll like to find out the Top 15 conditions and their respective codes.

13. Top 15 conditions and their codes.

14. What if we want to find out the 20th–29th condition?

Also, as we see from our Allergies and Careplans tables, allergies and careplans also have codes assigned to them.

Now to get the counts of our remaining tables — Allergies, Careplans, and Medications.

With these tables, one can query a patient’s allergy, careplan and medication.

I also queried the highest, second and third highest patient healthcare expenses. For more details on this, read this article.

That’ll be all for today. I know we made use of some really easy queries, so if you’d like me to make use of some more Intermediate or Advanced statements, e.g. ROW_NUMBER, CASE WHEN, IF, etc., do let me know in the comments section, so I’ll write another article on it. For those who don’t follow me on LinkedIn or Twitter, I have a 4-week long Data Analyst Interview Questions series, where I answer some important Excel, SQL, Power BI and Python Interview questions. If this is something you’d be interested in, also let me know in the comments section, so I’ll bring them over to Medium. This is also a reminder for you to query query always query never stop querying on SQL. You can practise on Stratascratch, DataLemur, HackerRank, and other fun places.

Thank you so much 🤗. You could go through my code for this project on my Github page here. You could also connect with me on Twitter and LinkedIn, where I write all about Healthcare, Marketing, and Product Analytics.

For now, bye, and have a lovely SQL weekend! 🤗

Bye from the Emergency Ward 😄

--

--

Maria-Goretti Anike

Hey yo there 😄! I'm Maria, your favourite Content Writer, Data Explorer & ardent SQL devotee. I write all about B2B, B2C, SaaS and Marketing/Product Analytics.