Are YOU gonna attend your medical appointment? If not who is to blame for this?

Data were taken from:


Data affects every aspect of our lives. How we use it in the medical sector might create the most powerful impact on human beings. Here is a simple example of how we can use data to increase attendance at medical appointments.

Let’s imagine that you had a medical appointment but for some reason, you didn’t show up. Who is to blame for this? A Kaggle dataset investigated 110K medical appointment details in Brazil from 2015 to 2016. As well as the appointment details the dataset includes preexisting conditions such as hypertension or diabetes, patients’ demographic information like age and gender, as well as the address and whether each patient has a scholarship.

The results of the study might answer many questions, such as the following;

  • Determining what factors are important for us to know to predict if a patient will show up for their scheduled appointment?
  • Is age range associated with attendance rate?
  • Which gender is more likely to show up for their appointments?
  • Are hypertension, diabetes, alcoholism, handicap, or receiving SMS related to the likelihood of patients showing up?
  • Is it possible to predict whether a patient will show up for their scheduled appointment?

However, let’s convert this list into 3 comprehensive questions which might help us to get an insight into how we can improve the scheduling process.

Part 1: How does having hypertension, diabetes, alcoholism, handicap, or receiving SMS affect the likelihood of patients showing up?

I want to do bivariate and multivariate analysis to find out the relationship between features and target, which is whether patients are going to attend their appointment. Here you can see the age distribution from Figure 1. The mean age of patients who didn’t attend their appointment is 34, whereas the mean age of patients who attended their appointment is 37. This means the average age of patients who were present at their appointment is 3 years older than patients who didn’t attend their appointment. Therefore, the study indicates that getting older might cause us to be less likely to miss an appointment. However, we need to do other analyses to be sure of this.

Figure 1

Let’s investigate how other factors, such as alcoholism, affect the likelihood of showing up. To understand this, I have calculated the show up ratio for each subset of the feature, which is mostly a flag variable that indicates the existence of the feature. A higher value means that a subset of the group is most likely to miss their appointment. Interestingly, patients who received SMS are the least likely to attend their appointment. Patients who are in the ‘ITARARÉ’ neighbourhood are the second least likely to attend.

Figure 2

Let’s examine the table in Figure 3 to understand the relationship between no show ratio and factors described above more clearly. From the information in the table, it can be concluded that patients who have a handicap, hypertension, diabetes, alcoholism, and are male are more likely to show up for their appointment. This is demonstrated by a positive difference value in the ‘diff’ column.

However, having a scholarship and getting an SMS seems to indicate a lower probability of attending the appointment. This finding is very surprising because we would expect that receiving an SMS would increase the likelihood of showing up to the appointment.

Receiving an SMS appears to be an important factor in whether or not patients attend their appointment, as this feature achieves the highest ‘abs_diff’ score. The ‘ITARARÉ’ neighbourhood is the second most important factor. Let’s use different analytical approaches, such as correlation and one variable logistic regression, to investigate the linear relationship between a variety of factors and attendance.

Figure 3
Figure 3

Part 2: Which factor is the most important for us to predict if a patient will show up for their scheduled appointment?

In the Pearson correlation results, positive values indicate a decrease in the probability of attending, whereas negative values suggest an increase. Age has the lowest correlation coefficient, which means that when patients are getting older they are more likely to show up for their appointment. I believe this is a great result because when people become older they are more at risk of getting sick. Therefore, it is nice to know they are not missing their appointment. Having hypertension is also the second biggest indicator of attendance.

Let’s look at positive correlations. As the “no_show” variable has 1 correlation with itself, we can ignore it. The second one is SMS_received, which we already see in part one. Surprisingly patients who received an SMS about their schedule were more likely to cancel appointments. We would expect people to show up for their appointment due to SMS reminders. I believe that the cancellations might have been caused by missed appointment messages as opposed to reminders. In addition, patients who held scholarships were also more likely to miss their appointment. Again, this was an unexpected result.

To be sure about the results, I also wanted to examine the importance of one variable logistic regressions in the linear relationship between factors. The results corresponded with the Pearson correlation.

Bonus: Is it possible to predict whether a patient will show up for their scheduled appointment?

The most valuable function of this data project is the predictions it allows us to make. I won’t enter the details but if you want to dig in more please don’t forget to visit the project GitHub page. I have trained using the preliminary xgboost model which is popular in both sector and kaggle competition. The model has 0.72 test accuracy, which I believe can be improved by additional feature extraction and parameter tuning.

According to the final model, having hypertension, scholarship, and living in ITINERE are the important feature sets.


In this article, we have looked at how factors affect patient attendance according to the Kaggle no show appointment dataset.

  • It seems that receiving an SMS causes people to miss their appointments. However, I believe we need to further examine this outcome because it does not seem to make sense. However, if this result turns out to be genuine, then it would imply that we need to stop sending SMS reminders out to patients. :)
  • Having any condition like hypertension, diabetes, etc, makes a patient more likely to attend their appointment. This is a reassuring result as these patients are vulnerable and require regular check-ups.
  • As expected, an increase in age correlates with an increase in the likelihood of a patient attending their appointment, however, surprisingly having a scholarship makes a patient less likely to attend. Is education killing responsibility?

In conclusion, several of these results are surprising and seem to indicate that receiving an SMS can have an adverse effect on medical appointment attendance, at least among certain groups. More research will need to be completed to ascertain whether or not receiving an SMS is the driving factor behind attendance within these cases.

To see more about this analysis and experiment, see the link to my Github available here.




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Five Bad Habits Every Data Scientist Should Avoid!

A Comparison of Official State COVID Data Visualization Dashboards: New York

The Data team at 90 Seconds and why we don’t have a Data Scientist

Data Science Layer Cake

Turning R script to a sleek report

VisionERA IDP — Automating Data Extraction and Verification in Healthcare

Black Swans and preparing for the unexpected

Topic detection and sentiment analysis for Netflix descriptions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Esra Arı

Esra Arı

More from Medium

Exploratory Data analysis: A systematic approach

Part 1 : Austin Bikes EDA and Geo Anal

What things make a good wine?

Data Set & Data Type