Healthcare Service Solutions Analysis
Problem statement
The objective of this analysis is to explore and provide insights into various aspects of the Healthcare Service Solutions industry.
This analysis aims to answer the following questions through exploration:
- The top 20 revenue-generating material codes
- The top 5 revenue-generating material groups
- Revenue ranking by each customer type
- Time series for 2023 or 2022
And this analysis aims to analyze the following:
- Year-to-date (YTD) performance by team (business unit)
- The performance of the same period last year and this year
- Percentage growth
- Revenue peaks
Tools
- Python
- Tableau
Python Libraries
- plotly
- sqlite3
- pandas
Part 1: Preparation
I imported the Python libraries that will be used in this project.
import plotly.express as px
import sqlite3
import pandas as pd
I connected to the database, retrieved the data, passed it to python, and then disconnected from the database.
db_path = 'C:/stuff/nh python/data_for_python.sqlite'
connection = sqlite3.connect(db_path)
Lab_Dataset_MASTER_PRICE = pd.read_sql(
"SELECT * FROM Lab_Dataset_MASTER_PRICE", connection)
Lab_Dataset_Transaction = pd.read_sql(
"SELECT * FROM Lab_Dataset_Transaction", connection)
connection.commit()
connection.close()
Get the list of the columns in the data.
list(Lab_Dataset_MASTER_PRICE.columns)
list(Lab_Dataset_Transaction.columns)
Using the info()
function provides the summary information of the data from the data.
Lab_Dataset_MASTER_PRICE.info()
Lab_Dataset_Transaction.info()
I discovered that the column names in the “Lab_Dataset_Transaction” table have some naming issues with ` “Customer Code ` and ` _Material Code `. Therefore, I need to change them to the appropriate names.
Lab_Dataset_Transaction.rename(columns={
'_Material Code': 'Material Code', '"Customer Code': 'Customer Code'}, inplace=True)
list(Lab_Dataset_Transaction.columns)
While filtering the data, I have made the decision to closely examine the team column of the “Lab_Dataset_Transaction” table.
Lab_Dataset_Transaction["Team"].unique()
As mentioned previously, I discovered that there are “#N/A” values appearing in the team column of the “Lab_Dataset_Transaction” table.
#identify all '#N/A' values and get the length of their occurrences.
na_transac = Lab_Dataset_Transaction["Team"] == "#N/A"
Lab_Dataset_Transaction[na_transac]
len(Lab_Dataset_Transaction[na_transac])
Then, I will delete all the ‘#N/A’ values and next, I will recheck it.
Lab_Dataset_Transaction = Lab_Dataset_Transaction[Lab_Dataset_Transaction['Team'] != "#N/A"]
Lab_Dataset_Transaction["Team"].unique()
And I noticed that the customer codes are in scientific format, such as “2.900000326E9”. To ensure, I have decided to convert them to the normal format, such as “2900000326”.
def scientific_format_to_normal(value):
if isinstance(value, str) and bool(re.match(r"^[0-9]+\.[0-9]+[Ee][+-]?[0-9]+$", value)):
return "{:.10f}".format(float(value))
return value
Lab_Dataset_Transaction["Customer Code"] = Lab_Dataset_Transaction["Customer Code"].apply(
scientific_format_to_normal)
Next, I used the pd.merge()
function to perform a left join between the `Lab_Dataset_Transaction` table and the `Lab_Dataset_MASTER_PRICE` table, based on the Material Code.
master_transac = pd.merge(left=Lab_Dataset_Transaction,
right=Lab_Dataset_MASTER_PRICE, how='left', on="Material Code")
master_transac
list(master_transac.columns)
After joining the `Lab_Dataset_Transaction` table and the `Lab_Dataset_MASTER_PRICE` table, I obtained a table that consists of the price and quantity. Therefore, I have decided to create a Revenue column by multiplying the Price by the Quantity. However, before performing the calculation, I need to change the data structure of the Quantity and Price columns to numeric format so that they can be calculated numerically.
# check the data type
print(master_transac["Price"].dtype)
print(master_transac["Qty"].dtype)
# change datatype to numeric
master_transac["Price"] = pd.to_numeric(master_transac["Price"])
print(master_transac["Price"].dtype)
master_transac["Qty"] = pd.to_numeric(master_transac["Qty"])
print(master_transac["Qty"].dtype)
#create a Revenue column
master_transac["revenue"] = master_transac["Price"]*master_transac["Qty"]
master_transac
Next, I import the new dataset.
master_custormer_group = pd.read_excel(
r"C:/stuff/R/nh_project/Master_Customer_Group.xlsx")
master_custormer_group
Next, I will perform a join operation between the `master_transac` and `master_custormer_group` tables to get the `Customer type` column using the pd.merge() function.
To accomplish this, I extracted the first two letters from the Customer Code column and created a new column called “code”. Then, I executed a left join between the `master_transac` and `master_custormer_group` tables using the “code” column from `master_transac` and the Customer Code column from `master_custormer_group`.
master_transac["code"] = master_transac["Customer Code"].str[:2]
master_transac
master_transac_cus = pd.merge(left=master_transac,
right=master_custormer_group, how="left", left_on="code", right_on="Customer Code")
However, it was observed that after extracted the first two letters, the approach worked well only for customer codes starting with two letters such as 27, 29, and OC. Unfortunately, it did not work for customer code starting with just one letter, which is C. As a result, the customer codes starting with C have ‘NaN’ values for the Customer type. To address this, I have made the decision to change all ‘NaN’ values to the customer type associated with the customer group that starts with the letter C. By implementing this approach, all the customer types are now displayed correctly. Next, I have made the decision to delete the “code” column and “Customer Code_y” column from the table as we will no longer be using it and rename the “Customer Code_x” column to “Customer Code”.
master_transac_cus = pd.merge(left=master_transac,
right=master_custormer_group, how="left", left_on="code", right_on="Customer Code")
master_transac_cus
master_transac_cus.fillna("Type B", inplace=True)
master_transac_cus
del master_transac_cus["Customer Code_y"]
del master_transac_cus["code"]
master_transac_cus
master_transac_cus = master_transac_cus.rename(
columns={"Customer Code_x": "Customer Code"})
master_transac_cus
The next step is to join the previous data with the ‘master_test_group’ data based on the material code to get a ‘material group’ column.
# import Master_Test_Group data
master_test_group = pd.read_excel(
r"C:\\stuff\\R\\nh_project\\Master_Test_Group.xlsx")
master_test_group
# join master_transac_cus and master_test_group by material code
master_transac_cus_test = pd.merge(left=master_transac_cus,
right=master_test_group, how="left", on="Material Code")
master_transac_cus_test
For the picture below, I would like to provide my thoughts to make the process easier to understand.
Part 2: Exploration
This exploration aims to answer the following question:
- The top 20 revenue-generating material codes
- The top 5 revenue-generating material groups
- Revenue ranking by each customer type
- Time series for 2023 or 2022
Question1: The top 20 revenue-generating material codes
I created the data to make it easier to answer the question, which contains only the sum of the revenue for distinct material codes and arranged the data in to a descending order, and only the top 20 ranks are selected.
agg_top20 = master_transac_cus_test.groupby("Material Code")["revenue"].sum()
agg_top20 = agg_top20.sort_values(ascending=False)
top20 = agg_top20.head(20).reset_index()
Next, I will visualize the top 20 revenue from tests for all material codes by plotting a bar chart.
fig = px.bar(top20, x="Material Code", y="revenue",
color="Material Code", template="seaborn", title="<b>The top 20 revenue-generating material codes</b>")
fig.update_yaxes(title="Revenue")
fig.show()
Question 2: The top 5 revenue-generating material groups
I created the data to make it easier to answer the question, which contains only the sum of the revenue for distinct material group and arranged the data in to a descending order, and only the top 5 ranks are selected.
agg_top5 = master_transac_cus_test.groupby("Material Group")["revenue"].sum()
agg_top5 = agg_top5.sort_values(ascending=False)
top5 = agg_top5.head(5).reset_index()
Next, I visualized the top 5 revenue from tests for material group by plotting a bar chart.
fig1 = px.bar(top5, x="Material Group", y="revenue",
color="Material Group", template="seaborn", title="<b>The top 5 revenue-generating material groups</b>")
fig1.update_yaxes(title="Revenue")
fig1.show()
Question 3: Revenue ranking by each customer type
I created the data to make it easier to answer the question, which contains only the sum of the revenue for distinct customer type and arranged the data in to a descending order.
rev_cus = master_transac_cus_test.groupby("Customer Type")["revenue"].sum()
rev_cus = rev_cus.sort_values(ascending=False)
revbycus = rev_cus.reset_index()
Next, I visualized the revenue from tests for customer type by plotting a bar chart.
fig2 = px.bar(revbycus, x="Customer Type", y="revenue",
color="Customer Type", template="seaborn", title="<b>Revenue ranking by each customer group<b>")
fig2.update_yaxes(title="Revenue")
fig2.show()
Question 4: Time series for 2023 or 2022
I created a dataset that contains the sum of revenue for each month in 2022 to answering a question. However, I realized that the ‘Year/Month’ column is not in date format, so I need to convert it to the appropriate format. After doing that, I proceeded to visualize the time series revenue for each month in 2022.
master_transac_cus_test["Year/Month"] = pd.to_datetime(
master_transac_cus_test["Year/Month"], format="%d-%b-%Y")
print(master_transac_cus_test.dtypes["Year/Month"])
data_2022 = master_transac_cus_test[master_transac_cus_test["Year/Month"].dt.year == 2022]
data_2022
rev_2022 = data_2022.groupby("Year/Month")["revenue"].sum()
rev_2022 = rev_2022.reset_index()
rev_2022
fig3 = px.line(rev_2022, x="Year/Month", y="revenue", template="seaborn",
markers=True, title="<b>Time series for 2022<b>")
fig3.update_xaxes(title="Month", tickformat="%b", dtick="M1")
fig3.update_yaxes(title="Revenue")
fig3.show()
Part 3: Data Analysis
This analysis aims to analyze the following:
- Year-to-date (YTD) performance by team (business unit)
- The performance of the same period last year (month, quarter, or year)
- Percentage growth
- Revenue peaks
1. Year-to-date (YTD) performance by team (business unit)
For the YTD analysis, I would like to use the data from 2022, specifically the first five months, as the data for 2023 also covers only the first five months. Therefore, in order to compare the YTD between these two years, I need to ensure they are set up in the same way for a comparison.
I have created a new dataset that contains data only from the first five months of 2023. Then, I calculated the revenue for each team by grouping the data based on the teams, and I did the same thing for 2022. After that, I performed a vertical merge on both datasets.
data_2023 = master_transac_cus_test[master_transac_cus_test["Year/Month"].dt.year == 2023]
data_2023 = data_2023.groupby("Team")["revenue"].sum().reset_index()
data_2023["Year"] = 2023
data_2023
data_2022_ytd = master_transac_cus_test[(master_transac_cus_test["Year/Month"].dt.year == 2022)
& master_transac_cus_test["Year/Month"].dt.month.isin(range(1, 6))]
data_2022_ytd = data_2022_ytd.groupby("Team")["revenue"].sum().reset_index()
data_2022_ytd["Year"] = 2022
data_2022_ytd
data_22_23 = pd.concat([data_2022_ytd, data_2023], ignore_index=True)
Next, I export this dataset into an XLSX file in order to visualize it further in Tableau.
#export the data
data_22_23.to_excel(r'C:\stuff\nh python\data_22_23.xlsx', index=False)
BIO
- The YTD revenue for the BIO business unit was approximately 40.72MB, which is lower than that of 2022 by about 0.55%.
NLS
- The YTD revenue for the NLS business unit was approximately 26,758 MB, which is lower than that of 2022 by about 14.71%.
NNG
- The YTD revenue for the NNG business unit was approximately 1.67 MB, which is lower than that of 2022 by about 24.68%.
PATHO
- The YTD revenue for the PATHO business unit was approximately 766.42 MB, which is lower than that of 2022 by about 21.57%.
WELLNESS
- The YTD revenue for the WELLNESS business unit was approximately 215.07 MB, which is lower than that of 2022 by about 22.79%.
2. The performance of the same period last year and this year.
For the analysis of the performance of the same period last year and this year, I created a new dataset that contains the Year/Month, sum of revenue, and Team.
Next, I will export this dataset to an XLSX file to further visualize it in Tableau. In Tableau, I will apply additional filters by selecting only the periods that I find interesting, which are April and May, in order to compare the performance of the same period last year and this year.
sameperiod = master_transac_cus_test.groupby(
["Year/Month", "Team"])["revenue"].sum().reset_index()
sameperiod
#export the data
sameperiod.to_excel(r'C:\stuff\nh python\sameperiod.xlsx', index=False)
I found that only April in 2023 shows positive revenue growth in the BIO business unit, while the rest of the months shows a negative revenue growth in every business unit. This suggests the need for further analysis and potential adjustments in business strategies.
3. Percentage growth
I used Tableau to visualize the revenue growth for each business unit between 2022 and 2023. The highlighted finding is that the BIO business unit is the only one that experienced less than a 10% negative revenue growth.
Next, I will analyze the revenue growth by business unit by applying the Boston Consulting Group’s product portfolio matrix (BCG Matrix) to assess the potential of our business units. The BCG Matrix is a valuable tool in long-term strategic planning as it helps businesses evaluate the growth potential of their products.
I have created a matrix similar to the BCG matrix, applying the concept of the BCG matrix to the data in my matrix.
I decided to use only the first five months in both years for analyzing.
BCG22_23 = master_transac_cus_test[(master_transac_cus_test["Year/Month"].dt.year == 2022)
& master_transac_cus_test["Year/Month"].dt.month.isin(range(1, 6)) | (master_transac_cus_test["Year/Month"].dt.year == 2023)
& master_transac_cus_test["Year/Month"].dt.month.isin(range(1, 6))]
BCG22_23
revforbcg = BCG22_23.groupby(
[BCG22_23["Year/Month"].dt.year, "Team"])["revenue"].sum().reset_index()
earliest_year_revenue = revforbcg.groupby('Team')['revenue'].transform('first')
#create a revenue growth columns
revforbcg['RevenueGrowth2'] = (
(revforbcg['revenue'] - earliest_year_revenue) / earliest_year_revenue) * 100
revforbcg
#create a market share columns
totalrev = revforbcg['revenue'].sum()
revforbcg['marketshare2'] = (
(revforbcg['revenue'] + earliest_year_revenue) / totalrev) * 100
revforbcg
# I want to delete the row where year/month = 2022.
revforbcg2 = revforbcg.loc[revforbcg["Year/Month"]
== 2023, ["RevenueGrowth2", "marketshare2"]]
revforbcg2
revforbcg3 = revforbcg[revforbcg["Year/Month"] == 2023]
revforbcg3
fig_bcg = px.scatter(revforbcg3, x="marketshare2",
y="RevenueGrowth2", text="Team", color="Team", template="seaborn")
fig_bcg.update_traces(marker_size=15, textposition="bottom right")
fig_bcg.update_xaxes(title="Market share (%)")
fig_bcg.update_yaxes(title="Revenue growth (%)")
fig_bcg.show()
Next, to make it easier to distinguish between the quadrants, I assign an animal element that based on the BCG matrix to each quadrant in the matrix. This helps to visually represent and differentiate the different quadrants more clearly.
- Question marks are products that exhibit high revenue growth but have a low market share. These products are growing rapidly but require a huge amount of resources from the firm. Typically, these products generate moderate revenue consistently. Question marks have the potential to become stars or dogs, depending on the strategies implemented. In our analysis, the business unit representing question marks is the BIO. The strategy for these products might involve conducting extensive market research to gain a deeper understanding of their potential. Additionally, it is crucial to consider whether these products have the potential to become stars or dogs, as this will influence investment decisions. It is necessary to carefully evaluate whether to invest more resources into these products or consider phasing them out.
- Stars are products that have both a high market share and high revenue growth. Based on our analysis, we don’t have any business units that correctly fit the star category.
- Dogs are products that have both a low market share and low revenue growth. These products typically require serious consideration regarding whether to continue investing in them or to make significant changes. The strategy for these products, if the firm decides to keep them, could involve rebranding or finding innovative ways to generate revenue. Based on our analysis, the products identified as dogs are PATHO, WELLNESS, and NNG. These products have low market shares and low revenue growth, indicating the need for strategic interventions such as rebranding or introducing new features to increase their revenue potential.
- Cash cow are product that has a high market share but low revenue growth. It is a product that consistently generates revenue for the firm. In our analysis, the product we focused on was the business unit NLS, which is classified as a cash cow in the BCG matrix. The strategy that might create more revenue for NLS is to create more opportunities in the market, possibly through increased promotion.
4. Peak Revenue
To visualize the peak revenue, I utilized Tableau for data visualization, presenting the data on a monthly basis.
As observed above, the revenue peaked in February 2022, coinciding with the period when the COVID-19 pandemic was prevalent. The increased revenue during that time can be attributed to the nature of our product, which is related to labs and health. The heightened demand for healthcare services and related products during the pandemic likely contributed to the surge in revenue.
Part 4: Summary
The analysis we’ve done so far is missing information about the cost of the products, which means our findings might not be accurate or could be misleading. Without this knowledge, we can’t fully investigate net profit and are limited to only looking at revenue. This makes it difficult to draw a definite and accurate conclusion or gain valuable insights.
Key Takeaways
For the short term, we should prioritize our NLS as it contributes around 90% of our total revenue. It is crucial to maintain the efficiency of revenue generation from NLS and explore opportunities to further increase our earnings. The reason behind this high revenue is that NLS consists of 20 material groups, with biochemistry, hematology, and immunology accounting for nearly 60% of the total revenue.
- Biochemistry contributed 36.87% of the total revenue in 2022. This is because biochemistry encompasses a wide range of analyses that cover various samples, such as fasting blood sugar and lipid panel tests. These tests are commonly performed in clinics and hospitals. As a large lab, we are responsible for testing samples from numerous customers, including both small and large clinics and hospitals. This unique advantage allows us to capture a significant portion of the biochemistry lab market.
- Immunology contributes to 14.73% of the total revenue in 2022. The immunology lab specializes in tests for antibodies and antigens. The reason it may not contribute as much as biochemistry is because immunology tests are more specific and targeted, compared to the broader range of tests covered by biochemistry.
- Hematology contributes to 10.24% of the total revenue in 2022. The hematology lab specializes in blood testing. The reason it may not contribute as much as biochemistry is because hematology tests focus on specific aspects of blood analysis, whereas biochemistry covers a broader range of tests.
For the long term, it is advisable to focus on other business units that currently have a smaller revenue share, particularly the NNG unit specializing in genetic and oncology testing. This type of laboratory is relatively new, expensive, and not yet widely known in our society. Therefore, it may cater to a different audience compared to our other business units. Our target customers for the NNG unit would likely be individuals who prioritize health as a form of wealth and recognize the value of investing in good health. As society becomes increasingly health-conscious over time, the NNG unit has the potential to become a popular trend. The COVID-19 pandemic has heightened awareness of the importance of good health, prompting many individuals to prioritize their well-being. To generate more revenue for our other business units, we should consider expanding our labs strategically. Since our business relies on physical locations, ensuring easy accessibility to our labs can help increase revenue. Additionally, to ensure long-term success, we can apply the concepts of the BCG matrix that we previously analyzed. This involves continuously investigating and refining our strategic approaches based on the growth potential and market position of each business unit.
—