Exploring relationships between the different attributes of CardioGood Fitness customers

A descriptive analytics project to create a customer profile for each CardioGood Fitness treadmill product line

Jake Tolentino
9 min readApr 22, 2020

Introduction

Once the community-quarantine is lifted, many healthy individuals may still opt to do moderate to vigorous-intensity physical activity at home to stay fit with the use of special equipment such as a treadmill instead of going to fitness centres. How can a retail store of a treadmill like CardioGood Fitness use its customer data to come up with a customer profile of the different products and generate a set of insights and recommendations that will help the company in targeting new customers?

I used the data on individuals who purchased a treadmill at a CardioGood Fitness retail store to identify differences between customers of each product, also to explore relationships between the different attributes of customers. The data is stored in the CardioGoodFitness.csv file. This data set is available from Kaggle. Link of the dataset:

The data is for customers of the treadmill products of a retail store called Cardio Good Fitness. It contains the following variable:

  1. Product — the model number of the treadmill
  2. Age — in number of years, of the customer
  3. Gender — of the customer
  4. Education — in number of years, of the customer
  5. Marital Status — of the customer
  6. Usage — Average number of times the customer wants to use the treadmill every week
  7. Fitness — Self rated fitness score of the customer (5 — very fit, 1 — very unfit)
  8. Income — of the customer
  9. Miles- expected to run

For my analysis, I will use python with pandas for data manipulation, seaborn and matplotlib packages for visualization.

Part 1: Uni-variate Analysis

First, I begin with creating a new variable ‘Miles Per Usage’ by dividing expected total miles by expected usage per week to derive expected miles per usage per customer.

Then I did a univariate analysis to understand the overall profiles of customers who purchased at least one treadmill at CardioGood Fitness in the prior three months.

Figure 1 below combines bar plots of each of the three categorical variables. The Product bar plot suggests that TM195 is the most popular treadmill product with the highest mode of 80 purchases, followed by TM498 with a frequency of 60 purchases. TM798 is the least popular product with a frequency of 40 purchases.

Figure 1

The Gender bar plot suggests that CardioGood Fitness treadmills are more popular among males than females. One hundred four males bought a treadmill as compared to 76 females. Also, the Marital Status bar plot suggests that CardioGood Fitness treadmills are more popular among partnered customers than single customers. One hundred seven partnered customers bought a treadmill as compared to 73 single customers.

Figure 2 below combines histograms and box plots of each of the remaining six numeric variables. First and foremost, Education boxplot suggests that the middle 50% of buyers have 14 to 16 years of education. Sixteen years of education — which is probably the equivalent of a college degree — is the mode shown in the Education histogram.

The Age boxplot indicates that while the age range of treadmill buyers can be quite large between 18 and 50, the middle 50% of the buyers are between 24 and 33 years old. We see that 25% of all buyers fall between 24 and 26 years old. Similarly, while buyers’ income can range between approximately $30,000 and $80,000, the middle 50% falls between $44,774 and $58,820, as shown in the Income boxplot.

Next, the Usage histogram reveals a right skew. Middle 50% of the buyers plan to use the treadmills between 3 to 4 times a week, as shown in the Usage boxplot, with three times a week as the mode shown in the Usage histogram. On the other hand, a left skew is seen in the Fitness histogram. Middle 50% of the buyers rated their fitness levels between 3 and 4, as seen in the Fitness boxplot, with three as the mode seen in the Fitness histogram. Finally, the Miles boxplot suggests that the middle 50% of them expect to run between 66 miles and 116.5 miles while the Miles Per Usage boxplot suggests that to be 23.5 to 33 miles per use.

Figure 2

Part 2: Bi-Variate Analysis

With an understanding of the overall profiles of customers who purchased at least one treadmill at CardioGood Fitness in the prior three months, I conducted further analyses to explore how these profiles differ across treadmill products TM195, TM498, and TM798.

Firstly, Figure 3 shows that while the gender split across TM195 and TM498 buyers is quite even, there is a considerable preference for TM798 among males.

Figure 3. Product Buyers by Gender

On the other hand, no significant difference in the marital status split between partnered and single is observed between products (Figure 4).

Figure 4. Product Buyers by Marital Status

Regarding Figure 5, TM195, and TM498 appeal to a broad range of ages. The buyers range between approximately 18–19 years old and 45–47 years old. However, TM798 buyers appeal to a noticeably smaller age range — the buyers only range between 22 years old and 38 years old.

Figure 5. Age Distribution by Product

It is clear from Figure 6 that TM798 buyers appear to be more highly educated than buyers of TM498 and TM195. While TM195 and TM498 buyers receive between 12 to 18 years of education, TM498 buyers receive between 14 to 21 years of education.

Figure 6. Education Distribution by Product

A similar observation is seen in Figure 7. While the buyers of TM195 and TM498 have incomes ranging from approximately $30,000 to $68,000, the buyers of TM798 have incomes ranging from approximately $49,000 to $105,000. The buyers of TM798 have a median income of $76,568.50, almost 1.5 times that of the buyers of TM195 and TM49.

Figure 7. Income Distribution by Product

Buyers of TM798 rate their fitness levels more highly than buyers of TM195 and TM498. As seen from chart Figure8, buyers of TM798 have a median fitness rating of 5 — the highest possible rating — while buyers of TM195 and TM498 have a median fitness level rating of 3.

Figure 8. Fitness Distribution by Product

Concerning Figure 9, buyers of TM798 plan to be heavier users of their treadmills than buyers of TM195 and TM498. They expect to use their treadmills 3 to 6 times a week, with a median usage of 5 times a week. On the other hand, buyers of TM195 and TM498 expect to use their treadmills between 2 to 5 times and 3 to 4 times, respectively, with median usages of 3 times a week.

Figure 9. Usage Distribution by Product

Correspondingly, Figure 10 suggests that buyers of TM798 expect to log more miles on their treadmills than buyers of TM195 and TM498. They expect to log between 75 miles and 300 miles on their treadmills, with a median of 160 miles. This is almost two times the median of the expected miles log by buyers of TM195 and TM498.

Figure 10. Miles Distribution by Product

Even when zooming in on the miles per usage (Figure 10), buyers of TM798 also expect to log a higher range between 23 and 33 miles than buyers of TM498 and TM195. Therefore, buyers of TM798 are not only looking to run more times per week, and hence more miles, they are also looking to run more miles each time.

Figure 11. Miles per Usage Distribution by Product

Figure 12 explores the correlations between all the variables in the dataset. We see a strong positive correlation of 0.79 between Fitness and Miles, also verified by scatterplot in Figure 13. This means the fitter a customer is, the more miles he or she is likely to run on the treadmill. Also, the correlation between Fitness and Usage is 0.67, while the correlation between Fitness and MilesPerUsage is 0.56. This suggests that the higher number of miles expected to be logged is more likely to be an outcome of running more times per week than running more miles each time.

There is also a noticeable positive correlation of 0.63 between Income and Education. This means the more years of education that a customer receives, the higher the income is he or she is likely to receive. This correlation is even stronger than age, which only has a positive correlation of 0.51 with income.

Finally, there is a strong positive correlation between TM798 purchases across Fitness (0.73), Income (0.71), Miles (0.66), Usage (0.65), and Education (0.58). On the other hand, there is a negative correlation between either TM498 purchases or TM198 purchases and the same variables, with a slightly stronger negative correlation seen with the latter.

Figure 12. Correlation Plot Between Variables
Figure 13. Scatterplot of Miles Against Fitness

Conclusion

TM195 is likely an entry-level, mainstream product catering to the masses. Its customer base represents an even split between males and females, and their ages range widely from 18 years old to 47 years old. Their median income is the lowest across the three treadmill products at $46,647, and years of education ranges between 12 and 18. They have a median fitness of 3, and there is a wide variation in the number of times they plan to use the treadmill per week, ranging from 2 times to 5 times a week. On average, they plan to run the least each time they use the treadmill. Overall, TM498 is likely a small upgrade from TM195 in terms of functionality and price.

Again, its customer base represents a relatively even split between males and females. Their age ranges between 19 and 45 years of age, and their median income is slightly higher at $49,460. Years of education range similarly between 12 and 18 and the median fitness is also 3. There is a lower variation in the number of times they intend to use the treadmill per week, ranging only between 3 to 4 times a week. On average, they plan to run slightly more each time they use the treadmill.

TM798 is likely a top-end treadmill with full, advanced functionality catered to the Fitness and running enthusiasts.

TM798 has a distinct customer profile that separates it clearly from that of TM195 and TM498. Its customer base is dominated by males, with a female: male ratio of about 1:5. The median Fitness is 5 (the highest possible rating), the median number of times they plan to use the treadmill per week is 5, and the median number of miles they intend to log is 160 miles, which is almost twice the median of TM195 and TM498 buyers. These enthusiasts also earn more than TM195 and TM498 buyers. They have a median income of $76,568.50 — almost 1.5 times that of the buyers of TM195 and TM49 — , and they receive more years of education, ranging between 14 to 21 years.

Interestingly, partnered customers are more likely to buy a treadmill across all three products. The ratio of partnered customers to single customers is approximately 1:1.4 across all products.

How are you going to stay physically active once the community quarantine is lifted?

To see more about this analysis, see the link to my Github available here.

--

--

Jake Tolentino
0 Followers

Data scientist committed to addressing complex societal challenges using statistics, information and decision systems, and social sciences.