Reducing Healthcare Costs With Data Analytics Part 2

7 min readDec 25, 2018

It starts with good data and ends with action

In 2018 it is estimated the US spent 3.5 trillion dollars on healthcare. A cost that is growing at nearly 5% a year. Rapidly growing pharmaceutical and administrative costs are just some of the driving factors behind this growth.

This leaves the question, how are we going to reduce healthcare costs? We can’t ignore the problem. Otherwise, we will be spending way more than $15 for a Tylenol in 20 years.

The first part starts with recognizing some of the largest problems in healthcare. We pointed these out in an infographic here(Fraud, waste, and administrative costs). There are nearly a trillion dollars of spending that could be improved if the correct processes and policies are put into place. However, simply knowing what the problems are is not enough. We need to know where and why they are occurring. This brings us to our first step in starting a healthcare analytics project.

Develop Accessible and Accurate Data Systems

In order to find out where the problems are, you need data that accurately represents what is occurring in your hospital/healthcare system. Relying purely on anecdotal evidence is no longer sufficient. It’s great to start and can guide data scientists and engineers to develop to correct data warehouses and dashboards. All of which requires quality data that is accessible to the correct users.

Developing a data system (ETLs, data warehouses, etc) that is both accurate and is accessible to the right people is a necessary step in confirming what the biggest cost-saving opportunities are.

This means thinking about what data you have, what data you need, who needs to access it and what questions are you hoping to answer. Not all of these can be answered right away. So it would be unwise to start by developing data warehouses that heavily adulterated from their original formats. That is why most data teams develop systems with the same basic pattern(example below).

Typically you start with raw flat files that get ingested into a raw database. That data is run through a QA process just to double check that the data is clean and makes sense. One the data is QAed it is processed into a set of staging tables with minimal business logic added. Typically, this is more of a mapping phase. Anything that needs to be standardized is, data that is duplicated is cleaned up and data normalization may occur. The goal here is to limit the amount of business logic and focus more on creating an accurate picture of what the data represents. Whether this is procedures done on patients, supplies used by a hospital or computers and devices used by employees. This data is just an abstract depiction of what is occurring at the healthcare provider.

Once the data is loaded successfully and again QAed (yes, QA should happen any time data has an opportunity to be adulterated) then it can be loaded into the tables that everyone that should have access to them can have access to them.

This base data layer will provide your data scientist the ability to create analytical layers that sit on top and populate various aggregate and metric tables.

This basic outline is tried and true. More than likely, your data warehouse team is already supporting a system similar to the one we described. Typically the problem lies in accessibility and silos. The data might be produced for various teams like finance, or IT. However, they might all exist in separate systems with different keys and structures. This makes it difficult to find value.

The basic pipeline will take data from a raw format, usually in CSV, XML, TSV, Positional or a list of other formats and process it into raw database tables. From there mapping and normalization may occur while the data is being loaded into a staging database. Finally, after all the data has been QAed, then the data will be loaded into the base tables where data scientists and analysts can pull them for their own use.

At this point, there needs to be a much larger initiative to provide the correct accessibility as well as clear connections between the data systems. This is why using different keys for the same data across systems is a bad idea. Going into this would require a whole other set of blog posts and discussions so we will leave this with the basic data system.

Define Problems Clearly

As data engineers and data scientists we often learn about the subject matter we are working on through osmosis. We have to understand what the data represents to a point. However, it is difficult for us to know all the nuances of a specific subject and to understand all the context around the problems we are trying to solve.

That is why it is important to try to clearly state the questions you are trying to answer as well as provide as much context as possible. It gives data professionals a better picture of the problem they are trying to solve. The better they understand the problem, the clearer the solution becomes.
Otherwise, they could spend hours going in circles or answering the wrong questions. This is actually pretty common because sometimes what a stakeholder says is understood completely differently by a data professional. Especially if they don’t know why. Then they have to come up with their own why that drives their analysis. That means they could provide the wrong support, or answers at the wrong granularity.

Create Concise Metrics

With all the modern verbiage floating around it can be tempting to attempt to create algorithms and metrics that require too much adulteration of the original data and the value it could offer.

Metrics that are concisely stated are also more easily understood by everyone. This allows managers to make better decisions because they understand what the metrics abstractly represent. Rather than having to struggle to get how some random ratio or calculated value means they should target customer x or y.

This starts with a well-defined population. Whether that population is procedures, people, transactions, etc. It represents a specific group of entities. It is not always wise to look at an entire population first. Larger populations are more likely to have confounding factors that are harder to eliminate with simpler metrics. Developing metrics focused on specific populations to start also makes it easier to notice patterns in larger groups.

Review The Outcome

Analyzing the outcome and trends in the metrics that are developed can help drive new insights and policies. This requires that the outcomes are actually studied. Not just when the results are initially released but on a regular cadence. Typically, the first few times the metrics are examined there can be immediate benefits as policies are changed, entities are reached out to (like in the case of fraud) and hopefully cost savings are found.

After the initial release of the dashboards, there needs to be a plan on how often the results will be reviewed. To often will cause unnecessary actions to be made before the previous ones have impacted the outcomes and not often enough (or none at all) will waste both the dashboard and the time spent developing it.

Make a plan with the necessary team members (usually a director or manager, some subject matter experts for context and one of the data team members). Having this mix will provide the correct amount of perspective while informing the director and data team member of any needs required from the subject matter team. The data team might need to update the metrics based on the findings

Present The Results Simply

Data, programming, and algorithms can all get very complicated. As data scientists and engineers we focus on the problem for so long we can begin to think everyone understands the problem as well as we do. Yet, this is not usually the case.

That is why we need to take a step back from our work and attempt to view our research, models, and analytics from the perspective of a teammate who hasn’t stared at the same problem for the past three months. It can be tempting to put all the graphs and logic that were analyzed over the past three months into a presentation or report. But this is likely to confuse a stakeholder who hasn’t been involved in your healthcare analytics project.

Usually, all that is required is a few key numbers to drive decisions. When you present too many numbers you risk burying the lead behind. It is important to focus on the key facts and figures first and if support is needed to provide it in a follow-up. This can be hard to do because it feels like we as engineers didn’t do a lot of work when we only show such a small amount of work. However, the key here is an impact, not showing off.

Quality data and analytics helps target and track savings opportunities in healthcare. When you start with accurate data and then develop concise metrics your team has the chance to make informed decisions on policies that can positively impact patients lives and at the end of the day that should theoretically be the goal.

We hope this helps you in your healthcare analytics project. Please feel free to reach out to our team if we can go in depth on any point! Our consulting team would be happy to help.