DATA SCIENCE PROJECT: PREDICTING THE AMOUNT OF SURGERY

Cem ÖZÇELİK
8 min readJun 11, 2022

--

This article written by Alparslan Mesri & Cem ÖZÇELİK.

Photograph via: Pexel, Vidal Balielo Jr.

It is very important for a health institution that the patients who come to this institution leave the health institution satisfied. On the other hand, the patients who come to the health institution leave satisfied, the service they have received, the positive approaches of the health institution employees, the facilities provided by the equipment of the institution to the visitors, etc. It is fed by various parameters such as At the same time, in addition to providing these opportunities to its visitors, there are also measures that a health institution can take in various areas in order to serve more than one patient or visitor and to get the best service for the patients who are in the customer position. For example, the correct planning of the appointments given to the patients, the time to deal with a patient, the precautions and preparations to be made for the post-operative care services of the patients who need surgery will increase the satisfaction of the patients.

In our study, we focused on a problem in order to prevent the disruptions arising from planning in the post-operative processes in order to ensure the satisfaction of the patients who need to have an operation in a hospital. And in this context, “How many days before the operation should a patient be given the date of surgery?” We went to the question. So much so that with the answer we can find to this problem, while the patients are planning their own lives, the healthcare institution management will be able to make their own planning so that the patient can get the best service in the post-operative period.

In this article, surgeries performed in a health institution have been studied on a sample data set, and how many days in advance is it appropriate to give the surgery date so that the health institution can provide the best service to these patients? An attempt was made to find an answer to the question.

The data set we used in the study will be shared at the end of the study. Without further ado, let’s get to our work. Let’s start by importing our libraries first.

Next, we import to dataset.

Overview of the dataset

Take a look at the descriptive statistics of the data set.

Descriptive Statistics

As can be seen, our dataset includes Actual(Number of operations performed on the date of the operation) and the number of operations 28 days before the time point named Actual. The values ​​in the data set are increasing cumulatively.

Here, our next step will be to look at the auto-correlation values ​​between the days. Since the problem we are considering can be considered as a time series problem, we will use the auto-correlation value to find the connection between the event occurring in a time unit and the event occurring in the previous time unit. To give brief information about the auto-correlation value, the auto-correlation between the day before today and today is always considered ~1.00 and the correlation will gradually decrease as of today.

Now, by making an aggregation between the days of the week, is there a connection between the days of the week and the surgeries performed in the health institution? Let’s answer your question. Here, we have aggregated according to the time zone T-1, which has the highest autocorrelation with the Actual value.

Mean & Std. Dev by DOW

The first 5 rows in this image show the mean values and the next 5 rows show the std deviations.

Now that we have seen the statistical values of our dataset, let’s get more descriptive information about our dataset and the problem we are dealing with by doing data visualization.

Distribution by DOW

As can be seen, the most surgeries are on Thursday in the days of the week. The least number of surgeries take place on Friday. Let’s also examine the average differences in the number of surgeries performed using boxplot.

The output of the above piece of code is shown in the graph below.

We see that by looking at the number of surgeries performed on the days of the week, we can conclude that there is a difference between the averages. Since Friday and Monday are the end and start days of the week, we can interpret that the noise level is more prominent on these two days.

After visualizing the variables of the data set, let’s perform anomaly test for the numerical variables in the data set. We will use the One-Way ANOVA test for this.

As the output of this code, we get the following image.

Statistical Informations By Timestamp

Since we examined the mean and std deviation values of the variables named T-x* in the data set in a table, we can perform our ANOVA test. Before performing our ANOVA test, let’s set up our alternative hypotheses, which we call H0 and H1:

H0: The total number of surgeries does not differ according to the day of the week.
H1 (Alternative Hypothesis): The total number of operations varies according to the day of the week.

The result table of the ANOVA test we performed is as follows. Below the table, you can see the results of the ANOVA test.

Results of One-Way ANOVA Test
  • By looking at the results of the ANOVA test, we reject the H0 hypothesis at the 99% confidence<We see that the P value we see as a result of the test is <0.05>
  • Looking at the test result, we can clearly say: There is a clear difference between the days of the week.

After this stage, we test with the TUKEY test whether there is a significant difference between the number of operations performed on one day of the week and another weekday in combination with it.

Result Of TUKEY Test

TUKEY Test results show us that there is a difference between the number of operations performed on any weekday and the number of operations performed in another week.

Now, “How many days before the surgery can we give the most appropriate surgery appointment to a patient who comes to the hospital for surgery?” We will be establishing a linear regression model to find an answer to the question.

First, we clarify the parameters of our model that we will build on our dataset:

  • The dependent variable to be used in the Linear Regression model is the “Actual” variable. Our arguments are all columns T-1 through T-28.

We already know the business problem we’re dealing with here. And we achieve the result we want to achieve by obtaining the unit of time that has the highest correlation with the Actual Variable. We can move on to building our model.

The point we pay attention to when setting up the model is that we need to clear the outlier parts of our data set from the data set. (We saw the outlier values in the box plot)

Then let’s start building our model.

The Table Of OLS

By looking at the OLS table, we can derive the equation for our regression model:

ACTUAL = (T-1)*1.0908 + (T-2)*0.1453 + (T-3)*(-0.1908) + 0*

*NOTE: Since the model does not contain a constant, std. we got zero error amount.

We fit to model:

The output image of the fitted model is as follows.

Result of Fitted Regression Model

We calculate to other performance metrics, RMS , RMSE, and other regression model metrics for the model we fit.

We set up our model for T-1, T-2, T-3 time points and obtained performance metrics. And now we set up a model again in order to find the compatibility between the surgery performed 3 days before the surgery date and the reality.

OLS Table for Predicting Model for 3 days before the Surgery Date

We built our model that predicts the number of surgeries performed 3 days before the surgery date and obtained performance metrics. In order to find the compatibility between the surgery performed 7 days before the surgery date and the reality, after this step we find out which appointment date is the most appropriate time period for the health institution by re-establishing a model.

We set up our model 7 days before the operation date and obtained the outputs related to our model performance. Now we compare the models we have established in order to make a clear decision on the most appropriate appointment date for the surgery.

Comparison of the models

Considering this table, we can say that:

  • The health institution examined in the data set can accurately determine the surgical expectations 3 days before the surgery day, with the least estimation error.
  • Technically, the result we obtained is the day with the highest correlation with the day of the surgery in the last 3 days. The surgery appointment can be given by looking at the situation 3 days before the surgery.

We got our result, and finally, we look at how the model we built follows a pattern in the time series graph.

The overlook’s of prediction model

We have come to the end of our work. I hope it was an enjoyable reading session for you. See you in our next article.

--

--