Einstein Prediction Builder: Predict Medical Appointment No-Shows

Published in

Salesforce Einstein Platform

7 min readDec 21, 2019

by Ed Sandoval, Senior Einstein Product Manager, Salesforce

With 15 million appointments missed every year, the United Kingdom National Health Service (NHS) is urging patients to cancel rather than just not show up. The NHS estimates that the cost of these missed medical appointments amounts to £216M a year (the equivalent of the annual salary of 2,325 full-time General Practitioners).

A similar picture emerges In the United States where “the cost of missed appointments is estimated at $150B per year”, according to Health Care Innovation.

Fortunately, Einstein Prediction Builder (EPB) is suited to generate predictions for this type of scenarios. Provided that sufficient historical medical appointment records are stored in Salesforce, EPB would allow a system administrator to apply AI to the problem.

Want to get a sense of how easy it is?. Read on. Spoiler alert: All the AI & ML scenes happen in the background (out of sight) so we can focus on the problem of predicting medical appointment No Shows!

The rest of this post shows how to set up Einstein Prediction Builder to predict the likelihood of an appointment to be a “No Show“.

As you will see, the “hardest” part of the whole process is simply to load the data into a custom object in Salesforce!

Use “Import Data” Trailhead module as a guide to your choices on how to bring data into Salesforce. In a matter of minutes, you could be loading data from a CSV into a custom object.

Once you do that, it is just “Point, Click, Predict”.

The Medical Appointment “No Show” dataset

For this exercise, I will use a slightly modified version of the “Kaggle Medical Appointment No Show” dataset (*). This dataset contains details about 110,338 medical appointments for 62,191 patients in Espírito Santo, Brazil. Each record stores basic information about the appointment such as:

The patient — ID, pre-existing conditions (Diabetes, Hypertension), Neighborhood, Number of Previous No Shows (*), Age Bracket (*)
The appointment — Scheduled Date, Appointment Date, Within 24 Hours? (*)
The outcome — Was the appointment a “No Show“?

Features marked (*) were added to the original dataset as they made intuitive sense as potential predictors. In the context of medical appointments, it could be reasonable to assume that:

Individuals with a high number of previous No-Shows are more likely to No Show in future appointments
Appointments booked “Within 24 hours” are less likely to be No Shows
Individuals in different age groups exhibit different No-Show behaviors. For example, under 5s, teenagers, working age people and retired populations will all have different No-Show patterns

As you will see in Section 2 below, Einstein Prediction Builder actually confirmed these assumptions! These 3 features were amongst the top 5 predictors in the final model!

But back to the dataset…The image below shows a small sample of appointment records loaded into a “Medical Appointments” custom object in Salesforce.

1. Create the Prediction

With the data loaded in Salesforce, the next step is to launch Prediction Builder. From Setup, type “Prediction Builder”.

Click on “New Prediction”

You will be guided through a series of screens to capture the information needed to build the prediction.

Name your prediction

First, give your prediction a name…

For this example, I called the prediction, “MyNoShowPrediction”.

Click Next.

Select an object to predict

Secondly, Choose the Medical Appointment custom object.

Notice the “Check Data” button. You can get a sense of the number of Appointments stored in the Medical Appointment object.

Select the field to predict and the training set

Next, You need to specify the “field to predict”. In this case, the field “No Show” is a boolean field that stores whether an appointment was a “No Show“.

Since this dataset contains medical appointments that took place between April 2016 and June 2016, let’s assume that the model to be built will learn from appointments prior to May 25, 2016 and predictions will be generated for appointments taking place on May 26, 2016 and onwards.

You will notice that The “Check Data” section shows that:

There will be 75143 records in the training set. With 5,749 of these “No Shows”
Predictions will be generated for 35,185 records
These are appointments from May 26th 2016 onwards

Select the fields Einstein should base your prediction on

Einstein is now asking which fields should he base your prediction on.

Most fields available in this dataset can be used to create your prediction.

For example, the fields included to create this prediction are:

Appointment Day, Day of the Week, Scheduled Day
Patient Age Bracket, Neighborhood, Previous No Shows
Patient existing medical conditions (Diabetes, Hypertension

However, there could be several reasons you may want to exclude a field. Some of these include:

Fields potentially introducing bias.
Fields that could give the answer away (potential “data leakers”)
System fields that are known to have no impact on the prediction (e.g. OwnerId as all records in this dataset are owned by the same user)

You can always check this post for great practical tips on fields to use in your predictions.

Name the field that stores your results

Finally, You need to specify a field to store your prediction score.

When the “Field to Predict” is a boolean (as in our case), the prediction will be a number representing the likelihood of an appointment being a “No Show”.

Click Next and “Build Prediction”.

Now, it is time for Einstein to automate the ML/AI modeling part of the process. Over the next 24 hours, the medical appointment data will be automatically retrieved, analyzed and transformed into useful predictive features. These, in turn, will be fed into a number of Machine learning algorithms to find the best model to fit around your data.

2. Review & enable the prediction

The next step is for you to decide if the prediction can be enabled. The prediction scorecard surfaces key information you will need to make a decision:

The Prediction Quality. A number close to 50 means that the model is no better than “tossing a coin”. In this case, a Score of 71 is Good. For the ML/Data Scientist reading this, the score is the Area under the ROC Curve (AuROC), a common metric used to compare predictive performance of different models.
The top predictors offer some insights into the data that the model considers when generating a score. In this case, the top predictors were:

* Appointment Scheduled Within24hrs
* Patient Previous “No Shows“
* Patient Age Bracket
* A full list of the predictors, along with their correlation can be obtained in the “Details” tab.

Once you are satisfied that the prediction is acceptable, you can just enable it. This will initiate the scoring process. This means that the 35,185 medical appointment records will be updated with a Likelihood of “No Show”. This process will be automatically run every hour. The scores will be adjusted to reflect any changes to the appointment record in the last hour.

For a more in-depth description of the prediction scorecard and other useful tips before enabling your prediction, I encourage you to read:

3. View your predictions

Once the scoring process has finished, you can add the score to a List View or Report in Salesforce.

In the figure above, you can see records with the highest scores were mostly “No Shows”… But wait, how do I know if the predictions are good? How can i turn the “Predicted No Score” into a “No Show” prediction? Moreover, how can I evaluate the predictions vs what actually happened?.

You can find answers to these questions and get other great insights from this blog post.

3. So what’s next?

Assuming the scores are good indicators for “No Shows”, you’ve got a way to identify potential “No Shows”!

The question is “What do you want to do about it?”. How do you use this information to prevent future “No Shows”? Some “No Show” prevention measures include:

Increasing awareness of the problem:
* Create targeted “No Show” Awareness campaigns.
* Use actual “No Show” rates by Age Range and Geo ( Neighborhood/County/State) to identify hot spots
* Use the predicted “No Shows” to highlight the potential impact to the community
Proactively targeting those likely to “No Show“ at their next appointment by:
* Sending Email Appointment Remainders 1–5 days prior to the appointment day
* Sending SMS Appointment Remainders 24 hours before the appointment day

Make sure you highlight the impact of missed medical appointments in your communications.