Beyond numbers: Numbeo’s insight into the world’s most livable countries (Python, K-Means with Silhouette method)

Mochamad Kautzar Ichramsyah
CodeX
Published in
23 min readJan 3, 2024

Introduction to Numbeo’s quality of live index

As explained in this page https://www.numbeo.com/quality-of-life/indices_explained.jsp :

The quality of life index estimates a city or country’s overall quality of life.

In this case, Numbeo is using various factors to measure the quality of life index, such as the purchasing power index, safety index, healthcare index, cost of living index, property price to income ratio, traffic commute time index, pollution index, and climate index.

It’s important to note that the quality of life index is based on data and surveys collected by Numbeo and we will use the 2023 index in this opportunity on the country level. For further explanation about the quality of life index and the various factors behind it:

  1. Quality of life index (higher is better). It’s a made-up value by Numbeo based on the various factors below using a formula. That’s why we can say that there is a causation between the quality of life index and various factors below, in case there is a correlation between those numbers. https://www.numbeo.com/quality-of-life/indices_explained.jsp
  2. Purchasing power index (higher is better) and cost of living index (lower is better) https://www.numbeo.com/cost-of-living/cpi_explained.jsp
  3. Pollution index (lower is better) https://www.numbeo.com/pollution/indices_explained.jsp
  4. Property price to income ratio (lower is better) https://www.numbeo.com/property-investment/indicators_explained.jsp
  5. Safety index (higher is better) https://www.numbeo.com/crime/indices_explained.jsp
  6. Healthcare index (higher is better) https://www.numbeo.com/health-care/indices_explained.jsp
  7. Traffic commute time index (lower is better) https://www.numbeo.com/traffic/indices_explained.jsp
  8. Climate index (higher is better) https://www.numbeo.com/climate/indices_explained.jsp

For the data source, I am using the data on this page https://www.numbeo.com/quality-of-life/rankings_by_country.jsp?title=2023-mid and to do the data analysis I am using Python. Without further ado, let’s go!

Quality of life index and rankings based on Numbeo’s calculation

Output 1: Read the data and show its information

Based on Output 1, we have 84 countries listed, including the various factors mentioned in the previous chapter.

Output 2: Show the top 5 and bottom 5 countries

Based on Output 2, the top 5 countries with the highest quality_of_life_index are Luxembourg and the Netherlands (equal 200.1), Iceland (191.1), Denmark (190.6), and Finland (188.1). Then, the bottom 5 countries with the lowest quality_of_life_index are Nigeria (49.5), Bangladesh (69.5), Venezuela (74.4), Sri Lanka (76.5), and Iran (77.6)

In case you want to know a specific country, you can use this code:

Output 3: Looking for a specific country rank

Based on Output 3, we can see that Indonesia placed in rank 74 of 84 countries listed with a 92.0 quality of life index.

Output 4: Null result if not available in the dataset
Output 5: Boxplot of the quality of life index
Output 6: The statistical measurements of the quality of life index

Based on Output 5 and Output 6,

  1. It shows that there are no outlier values in the quality_of_life_index.
  2. The average and median (percentile 50) quality_of_life_index are 134.27 and 131.5 respectively, which is not too different
  3. The most common quality_of_life_index (values between percentile 25 and percentile 75, shown by the blue box of the boxplot) is 106.47 to 164.57.
  4. The highest and lowest quality_of_life_index is 200.1 and 35.34 respectively (proving Luxembourg, the Netherlands, and Nigeria’s quality of life index previously)
Output 7: The first 5 countries higher and lower than the average quality of life index

Based on Output 7, we can see that Uruguay (ranked 41, a 136.2 quality_of_life_index) and Hungary (ranked 42, a 131.6 quality_of_life_index) are the countries that separate between the higher and lower than the average quality_of_life_index.

After extracting all the information above, we can’t say that the countries with a lower quality of life index than the average are bad. That’s why we need to explore the various factors that build the quality of life index.

Understanding the data behind Numbeo’s quality of life rankings

The power of the purse: Exploring quality living through the purchasing power index

In this part, we focus on exploring the purchasing_power_index (higher is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index.

Output 8: Show the top 5 and bottom 5 countries based on `purchasing_power_index`

Based on Output 8, we can see that the country list has changed. The top 5 countries with the highest purchasing_power_index are Luxembourg (133.2), Qatar (120.2), UAE (118.0), the United States (117.7), and Switzerland (110.8). Then, the bottom 5 countries with the lowest purchasing_power_index are Nigeria (9.4), Venezuela (11.3), Sri Lanka(74.4), Egypt (17.4), and Lebanon (19.4).

  1. Luxembourg constantly appeared in the top 5 countries based on the quality_of_life_index and purchasing_power_index.
  2. Nigeria, Venezuela, and Sri Lanka constantly appeared in the bottom 5 countries based on the quality_of_life_index and purchasing_power_index.

Based on those facts, I think there is some correlation between the quality_of_life_index and the purchasing_power_index.

Output 9: Visualize the scatterplot between quality_of_life_index and purchasing_power_index

Based on Output 9, there is a strong positive linear correlation (0.87) between the quality_of_life_index and the purchasing_power_index. It means the higher the purchasing_power_index affects to higher quality_of_life_index (causation).

We can confirm its causation because we know that Numbeo calculates the quality_of_life_index based on the purchasing_power_index as one of the factors.

Output 10: Boxplot of the purchasing_power_index
Output 11: The statistical measurements of the quality of life index

Based on Output 10 and Output 11,

  1. It shows that there are no outlier values in the purchasing_power_index.
  2. The average and median (percentile 50) purchasing_power_index are 59.14 and 52.15 respectively, which is not too different
  3. The most common purchasing_power_index (values between percentile 25 and percentile 75, shown by the sky-blue-colored box of the boxplot) is 32.95 to 86.25.
  4. The highest and lowest purchasing_power_index is 133.2 and 9.4 respectively (proving Luxembourg and Nigeria’s purchasing_power_index previously)

Safe haven: A deep dive into safety Index and its impact on quality of life

In this part, we focus on exploring the safety_index (higher is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index and the correlation with purchasing_power_index.

Output 12: Show the top 5 and bottom 5 countries based on `safety_index`

Based on Output 12, we can see that the country list has changed. The top 5 countries with the highest safety_index are Qatar (85.7), UAE (85.4), Taiwan (83.9), Oman (80.4), and Hong Kong (78.3). Then, the bottom 5 countries with the lowest safety_index are Venezuela (17.9), South Africa (24.5), Peru (32.5), Brazil (33.9), and Nigeria (34.2).

  1. Qatar and UAE constantly appeared in the top 5 countries based on the purchasing_power_index and safety_index.
  2. Venezuela and Sri Lanka constantly appeared in the bottom 5 countries based on the quality_of_life_index, purchasing_power_index, and safety_index.
  3. We can see that the top 5 countries came from the Asia region. Otherwise, the bottom 5 countries came from Africa and South America region.

Based on those facts, I think there is some correlation between the quality_of_life_index, purchasing_power_index, and safety_index.

Output 13: Visualize the scatterplot between quality_of_life_index and safety_index
Output 14: Visualize the scatterplot between purchasing_power_index and safety_index

Based on Output 13 and Output 14, there is a moderate positive linear correlation (0.57) and (0.51) between the safety_index and the quality_of_life_index also purchasing_power_index respectively. It means the higher the safety_index affects to higher quality_of_life_index (causation). For some unknown reasons, safety_index also positively correlated with purchasing_power_index, we can’t say there is causation, but logically when a country, city, or area, with high purchasing_power_index I think it indirectly affects the safety_index because the crime should be lower than other areas that have low purchasing_power_index.

Output 15: Boxplot of the safety_index with outliers labeling
Output 16: The statistical measurements of the safety index

Based on Output 15 and Output 16,

  1. It shows that there are outlier values in the safety_index which is labeled by the two countries with the lowest safety_index, Venezuela and South Africa.
  2. The average and median (percentile 50) safety_index are 59.6 and 59.55 respectively, which is not too different.
  3. The most common safety_index (values between percentile 25 and percentile 75, shown by the red-colored box of the boxplot) is 52.85 to 71.2.
  4. The highest and lowest safety_index are 85.7 and 17.9 respectively (proving Qatar and Venezuela’s safety_index previously)

Preserving wellness: Navigating the healthcare index landscape

In this part, we focus on exploring the health_care_index (higher is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index and the correlation with purchasing_power_index and safety_index.

Output 17: Show the top 5 and bottom 5 countries based on `health_care_index`

Based on Output 17, we can see that the country list has changed. The top 5 countries with the highest health_care_index are Taiwan (85.9), South Korea (83.0), Japan (79.6), France (78.8), and the Netherlands (78.6). Then, the bottom 5 countries with the lowest health_care_index are Venezuela (39.2), Bangladesh (42.0), Morocco (45.4), Azerbaijan (47.5), and Belarus (47.7).

  1. The top 3 countries of health_care_index came from the Asia region.
  2. Venezuela is the only country constantly appearing in the bottom 5 countries based on the quality_of_life_index, purchasing_power_index, safety_index, and health_care_index.
Output 18: Visualize the scatterplot between quality_of_life_index and health_care_index
Output 19: Visualize the scatterplot between purchasing_power_index and health_care_index
Output 20: Visualize the scatterplot between safety_index and health_care_index

Based on Output 18, 19, and 20, there is a moderate positive linear correlation (0.62), (0.58), and (0.41) between the health_care_index and the quality_of_life_index, purchasing_power_index, and safety_index respectively. It means the higher the health_care_index affects to higher quality_of_life_index (causation). For some unknown reasons, health_care_index also positively correlated with purchasing_power_index and safety_index, we can’t say there must be a causation.

Output 21: Boxplot of the health_care_index
Output 22: The statistical measurements of the health_care_index

Based on Output 21 and Output 22,

  1. It shows that there are no outlier values in the health_care_index.
  2. The average and median (percentile 50) health_care_index are 64.75 and 65.75 respectively, which is not too different.
  3. The most common health_care_index (values between percentile 25 and percentile 75, shown by the green-colored box of the boxplot) is 57.87 to 72.37.
  4. The highest and lowest health_care_index is 85.9 and 39.2 respectively (proving Taiwan and Venezuela’s health_care_index previously)

Balancing the budget: Demystifying the cost of living index

In this part, we focus on exploring the cost_of_living_index (lower is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index and the correlation with purchasing_power_index, safety_index, and health_care_index.

Based on Output 23, we can see that the country list has changed. The top 5 countries with the lowest cost_of_living_index (lower are better) are Pakistan (17.6), Egypt (21.7), India (22.9), Nigeria (23.2), and Bangladesh (26.2). Then, the bottom 5 countries with the highest cost_of_living_index are Switzerland (117.3), Iceland (87.7), Singapore (85.9), Norway (82.2), and Denmark (79.2).

  1. The top 5 countries of cost_of_living_index came from Asia and Africa region. Otherwise, the bottom 5 countries came from Europe and Singapore as the most expensive to live in Asia region.
  2. The low cost of living does not always mean it’s a good thing. We have to check the correlation with the other factors to know if is it a good thing or not.
Output 24: Visualize the scatterplot between quality_of_life_index and cost_of_living_index
Output 25: Visualize the scatterplot between purchasing_power_index and cost_of_living_index
Output 26: Visualize the scatterplot between safety_index and cost_of_living_index
Output 27: Visualize the scatterplot between health_care_index and cost_of_living_index

Based on Output 24, 25, 26, and 27,

  1. Strong positive linear correlation (0.75) and (0.76) between the cost_of_living_index and the quality_of_life_index and purchasing_power_index respectively. It means the higher the cost_of_living_index affects to higher quality_of_life_index (causation). For some unknown reasons, the cost_of_living_index also positively correlated with the purchasing_power_index, logically using economic principle, if the demand (purchasing_power) is strong, then the supply (cost_of_living) is strong too.
  2. Moderate positive linear correlation (0.44) and (0.53) between the cost_of_living_index and the safety_index and health_care_index respectively.
Output 28: Boxplot of the cost_of_living_index with outliers labeling
Output 29: The statistical measurements of the cost_of_living_index

Based on Output 28 and Output 29,

  1. It shows that there is an outlier value in the cost_of_living_index which is located in Switzerland.
  2. The average and median (percentile 50) cost_of_living_index are 49.69 and 47.85 respectively, which is not too different.
  3. The most common cost_of_living_index (values between percentile 25 and percentile 75, shown by the yellow-colored box of the boxplot) is 34.45 to 61.65.
  4. The highest and lowest cost_of_living_index is 117.3 and 17.6 respectively (proving Switzerland and Pakistan’s cost_of_living_index previously)

Home sweet home: Understanding property price to income ratio

In this part, we focus on exploring the property_price_to_income_ratio (lower is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index and the correlation with purchasing_power_index, safety_index, health_care_index, and cost_of_living_index.

Output 30: Show the top 5 and bottom 5 countries based on `property_price_to_income_ratio`

Based on Output 30, we can see that the country list has changed. The top 5 countries with the lowest property_price_to_income_ratio (lower are better) are Saudi Arabia (2.9), Oman (3.2), UAE (3.3), South Africa (3.3), and the United States (4.2). Then, the bottom 5 countries with the highest property_price_to_income_ratio are Hong Kong (42.1), Sri Lanka (35.3), China (33.0), Philippines (31.0), and Thailand (25.8).

  1. The top 3 countries of property_price_to_income_ratio came from Middle-East Asia.
  2. Similarly, the bottom 5 countries came from the Asia region. We can assume that the property_price_to_income_ratio is great if we choose Middle-East Asia, but it’s contrary to East Asia and South-East Asia.
  3. The low property price to income ratio does not always mean it’s a good thing. We have to check the correlation with the other factors to know if is it a good thing or not.
Output 31: Visualize the scatterplot between quality_of_life_index and property_price_to_income_ratio
Output 32: Visualize the scatterplot between purchasing_power_index and property_price_to_income_ratio
Output 33: Visualize the scatterplot between safety_index and property_price_to_income_ratio
Output 34: Visualize the scatterplot between health_care_index and property_price_to_income_ratio
Output 35: Visualize the scatterplot between cost_of_living_index and property_price_to_income_ratio

Based on Output 31, 32, 33, 34, and 35,

  1. Moderate negative linear correlation (-0.62) and (-0.52) between the property_price_to_income_ratio and the quality_of_life_index and purchasing_power_index respectively. It means the lower the property_price_to_income_ratio affects to higher quality_of_life_index (causation). For some unknown reasons, the property_price_to_income_ratio also negatively correlated with the purchasing_power_index. It’s quite logical to understand when we have higher purchasing power, our capability to buy properties is getting higher too (lower ratio).
  2. Weak negative linear correlation (-0.29) between the property_price_to_income_ratio and the cost_of_living_index.
  3. No linear correlation (-0.08) and (-0.07) between the property_price_to_income_ratio and the safety_index and healthcare_index.
Output 36: Boxplot of the property_price_to_income_ratio with outliers labeling
Output 37: The statistical measurements of the property_price_to_income_ratio

Based on Output 36 and Output 37,

  1. It shows that there are outlier values in the property_price_to_income_ratio. The highest one is Hong Kong, followed by Sri Lanka, China, the Philippines, Thailand, Lebanon, and Vietnam.
  2. The average and median (percentile 50) property_price_to_income_ratio are 13.25 and 11.7 respectively, the gap is quite big because there are some outliers.
  3. The most common property_price_to_income_ratio (values between percentile 25 and percentile 75, shown by the orange-colored box of the boxplot) is 9.05 to 14.95.
  4. The highest and lowest property_price_to_income_ratio is 42.1 and 2.9 respectively (proving Hong Kong and Saudi Arabia’s property_price_to_income_ratio previously)

Navigating the daily grind: Analyzing the traffic commute time index

In this part, we focus on exploring the traffic_commute_time_index (lower is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index and the correlation with purchasing_power_index, safety_index, health_care_index, cost_of_living_index, and property_price_to_income_ratio.

Output 38: Show the top 5 and bottom 5 countries based on `traffic_commute_time_index`

Based on Output 38, we can see that the country list has changed. The top 5 countries with the lowest traffic_commute_time_index (lower are better) are Estonia (22.2), Iceland, Oman (equally 22.3), Cyprus (22.8), and the Netherlands (23.8). Then, the bottom 5 countries with the highest traffic_commute_time_index are Nigeria (62.8), Bangladesh (57.4), Sri Lanka (56.4), Kenya (51.6), and Peru (49.1).

  1. The top 3 countries of traffic_commute_time_index came from Northern Europe.
  2. The bottom 4 countries of the traffic_commute_time_index are dominated by Africa and South Asia.
Output 39: Visualize the scatterplot between quality_of_life_index and traffic_commute_time_index
Output 40: Visualize the scatterplot between purchasing_power_index and traffic_commute_time_index
Output 41: Visualize the scatterplot between safety_index and traffic_commute_time_index
Output 42: Visualize the scatterplot between health_care_index and traffic_commute_time_index
Output 43: Visualize the scatterplot between cost_of_living_index and traffic_commute_time_index
Output 44: Visualize the scatterplot between property_price_to_income_ratio and traffic_commute_time_index

Based on Output 39, 40, 41, 42, 43, and 44,

  1. Strong negative linear correlation (-0.72) between the traffic_commute_time_index and the quality_of_life_index. It means the lower the traffic_commute_time_index affects to higher quality_of_life_index (causation).
  2. Moderate negative linear correlation (-0.50), (-0.51), and (-0.48) between the traffic_commute_time_index and the purchasing_power_index, safety_index, andcost_of_living_index respectively.
  3. Weak negative linear correlation (-0.26) between the traffic_commute_time_index and the health_care_index.
  4. Moderate positive linear correlation (0.46) between the traffic_commute_time_index and the property_price_to_income_ratio.
Output 45: Boxplot of the traffic_commute_time_index with outliers labeling
Output 46: The statistical measurements of the traffic_commute_time_index

Based on Output 45 and Output 46,

  1. It shows that there are outlier values in the traffic_commute_time_index. The highest one is Nigeria, followed by Bangladesh and Sri Lanka.
  2. The average and median (percentile 50) traffic_commute_time_index are 35.32 and 35.1 respectively, with no significant difference.
  3. The most common traffic_commute_time_index (values between percentile 25 and percentile 75, shown by the grey-colored box of the boxplot) is 29.2 to 39.42.
  4. The highest and lowest traffic_commute_time_index is 62.8 and 22.2 respectively (proving Nigeria and Estonia’s traffic_commute_time_index previously)

Breathing easy: Insights into the pollution index

In this part, we focus on exploring the pollution_index (lower is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index and the correlation with purchasing_power_index, safety_index, health_care_index, cost_of_living_index, property_price_to_income_ratio, and traffic_commute_time_index.

Output 47: Show the top 5 and bottom 5 countries based on `pollution_index`

Based on Output 47, we can see that the country list has changed. The top 5 countries with the lowest pollution_index (lower are better) are Finland (11.8), Iceland (15.7), Estonia (17.1), Sweden (17.9), and Norway (18.0). Then, the bottom 5 countries with the highest pollution_index are Lebanon (89.4), Nigeria (88.2), Bangladesh (85.2), Vietnam (84.2), and Peru (83.0).

  1. The top 5 countries of pollution_index came from Northern Europe. It must be nice to live in a region with the lowest pollution :)
  2. The 3 of the bottom 5 countries of pollution_index came from the Asia region.
Output 48: Visualize the scatterplot between quality_of_life_index and pollution_index
Output 49: Visualize the scatterplot between purchasing_power_index and pollution_index
Output 50: Visualize the scatterplot between safety_index and pollution_index
Output 51: Visualize the scatterplot between healthcare_index and pollution_index
Output 52: Visualize the scatterplot between cost_of_living_index and pollution_index
Output 53: Visualize the scatterplot between property_price_to_income_ratio and pollution_index
Output 54: Visualize the scatterplot between traffic_commute_time_index and pollution_index

Based on Output 48, 49, 50, 51, 52, 53, and 54,

  1. Strong negative linear correlation (-0.89) between the pollution_index and the quality_of_life_index and cost_of_living_index.It means the lower the pollution_index affects to higher quality_of_life_index (causation). At the same time, the high value of pollution_index usually happens in a country with a low value of cost_of_living_index.
  2. Moderate negative linear correlation (-0.67) and (-0.54) between the pollution_index and the purchasing_power_index and health_care_index respectively.
  3. Weak negative linear correlation (-0.38) between the pollution_index and the safety_index.
  4. Moderate positive linear correlation (0.6) and (0.46) between the pollution_index and the traffic_commute_time_index and property_price_to_income_ratio respectively.
Output 55: Boxplot of the pollution_index
Output 56: The statistical measurements of the pollution_index

Based on Output 55 and Output 56,

  1. It shows that there is no outlier value in the pollution_index.
  2. The average and median (percentile 50) pollution_index are 52.68 and 56.8 respectively, with no significant difference.
  3. The most common pollution_index (values between percentile 25 and percentile 75, shown by the purple-colored box of the boxplot) is 35.07 to 68.45.
  4. The highest and lowest pollution_index are 89.4 and 11.8 respectively (proving Lebanon and Finland’s pollution_index previously)

Climate comfort: The climate index chronicles how weather influences the quality of living

In this part, we focus on exploring the climate_index (higher is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index and the correlation with purchasing_power_index, safety_index, health_care_index, cost_of_living_index, property_price_to_income_ratio, traffic_commute_time_index, and pollution_index.

Output 57: Show the top 5 and bottom 5 countries based on `climate_index`

Based on Output 57, we can see that the country list has changed. The top 5 countries with the highest climate_index (higher are better) are Venezuela (99.9), Kenya (99.8), Argentina (98.3), Uruguay (98.0), and Portugal (97.8). Then, the bottom 5 countries with the lowest climate_index are Kuwait (20.2), Qatar (36.0), Kazakhstan (39.8), Saudi Arabia (41.4), and UAE (45.8).

  1. The top 5 countries of climate_index are dominated by the South America region.
  2. The bottom 5 countries of climate_index came from the Middle-East Asia region.
Output 58: Visualize the scatterplot between quality_of_life_index and climate_index
Output 59: Visualize the scatterplot between purchasing_power_index and climate_index
Output 60: Visualize the scatterplot between safety_index and climate_index
Output 61: Visualize the scatterplot between healthcare_index and climate_index
Output 62: Visualize the scatterplot between cost_of_living_index and climate_index
Output 63: Visualize the scatterplot between property_price_to_income_ratio and climate_index
Output 64: Visualize the scatterplot between traffic_commute_time_index and climate_index
Output 65: Visualize the scatterplot between pollution_index and climate_index

Based on Output 58, 59, 60, 61, 62, 63, 64, and 65,

  1. Weak negative linear correlation (-0.38) and (-0.23) between the climate_index and the safety_index and purchasing_power_index.
  2. No linear correlation was detected from other factors.
Output 66: Boxplot of the climate_index labeled with outlier countries
Output 67: The statistical measurements of the climate_index

Based on Output 66 and Output 67,

  1. It shows that in climate_index Kuwait and Qatar are labeled as bottom outlier values.
  2. The average and median (percentile 50) climate_indexare 77.77 and 80.7 respectively.
  3. The most common climate_index(values between percentile 25 and percentile 75, shown by the pink-colored box of the boxplot) is 68.62 to 90.22.
  4. The highest and lowest climate_indexare 99.9 and 20.2 respectively (proving Venezuela and Kuwait’s climate_indexpreviously)

Summarize the correlation between each factor

Output 68: Visualize correlation between all factors (orange-colored for correlation with quality_of_life_index)
Output 69: Visualize correlation between factors and quality_of_life_index

Based on Output 68 and 69,

  1. Only climate_index as a building factor does not correlate with quality_of_life_index.
  2. The building factors that have a strong correlation to each other are purchasing_power_index vs. cost_of_living_index and purchasing_power_index vs. pollution_index
  3. The positive correlation between quality_of_life_index and the building factors arepurchasing_power_index: strong (0.87), cost_of_living_index: strong (0.75), health_care_index: moderate (0.62), and safety_index: moderate (0.57)
  4. The negative correlation between quality_of_life_index and the building factors arepollution_index: strong (-0.89), traffic_commute_time_index: strong (-0.72), property_price_to_income_ratio: moderate (-0.62), andclimate_index: none (-0.02)

Clustering the countries

We know that the rank based on the quality_of_life_index provided by Numbeo uses this formula below:

index.main = Math.max(0, 100 + purchasingPowerInclRentIndex / 2.5 — (housePriceToIncomeRatio * 1.0) — costOfLivingIndex / 10 + safetyIndex / 2.0 + healthIndex / 2.5 — trafficTimeIndex / 2.0 — pollutionIndex * 2.0 / 3.0 + climateIndex / 3.0);

The formula used by Numbeo to calculate `quality_of_life_index`

But, we know that climate_index does not even correlate with the quality_of_life_index, that’s why in my opinion, we can create a better one. In this case, I will use K-means clustering to get better groupings of the countries, not only rankings, so we can have alternatives for similar countries based on the factors we have.

Silhoutte method to decide the best K to be used

Output 70: Silhouette method output
  1. For a given value of K, it is expected that all clusters possess a Silhouette score surpassing the average score denoted by the red-dotted line, as depicted on the x-axis. Clusters corresponding to K = 8 and K = 9 are excluded from consideration as they do not adhere to this criterion.
  2. Consistency in the cluster sizes is preferred, and wide variations are discouraged. The width of clusters, indicative of the number of data points they contain, exhibits considerable disparity for K values of 2, 3, and 4 compared to other clusters. Hence, my preference leans toward selecting from the options of K = 5, 6, and 7.
  3. Based on Output 70, upon further exploration of the data, I concluded that K = 7 is the most suitable choice for the optimal number of clusters.

Implementing K-Means clustering with K = 7

Below is the snapshot of the pair plot after we implement the K-Means clustering with K = 7.

Output 71: Full display of the pair plot after implementing K-Means clustering with K = 7
Output 72: The statistical measurements for each factor and each cluster. (Part 1)
Output 73: The statistical measurements for each factor and each cluster. (Part 2)
Output 74: The statistical measurements for each factor and each cluster. (Part 3)

Based on Output 71, 72, 73, and 74, we can give a proper name for each cluster that describes the characteristic.

Cluster 0: Tier 1

Cluster 0 will be called “tier_1”. This cluster consists of countries that have the best all-round qualities they can offer from all factors. I also add the rank of the median value of each factor in this cluster.

The bad part of this cluster is it has the worst cost_of_living_index and also not a good climate_index.

  1. purchasing_power_index: Rank 2 of 7
  2. pollution_index: Rank 1 of 7
  3. safety_index: Rank 2 of 7
  4. health_care_index: Rank 2 of 7
  5. cost_of_living_index: Rank 7 of 7
  6. property_price_to_income_ratio: Rank 2 of 7
  7. traffic_commute_time_index: Rank 2 of 7
  8. climate_index: Rank 5 of 7
Output 75: Tier 1 countries

Cluster 1: Tier 5

Cluster 1 will be called “tier_5”. This cluster consists of countries that have the worst all-around qualities they can offer from all factors. I also add the rank of the median value of each factor in this cluster.

The good part of this cluster is it has the best cost_of_living_index and climate_index. Contradictory to Tier 1.

  1. purchasing_power_index: Rank 7 of 7
  2. pollution_index: Rank 7 of 7
  3. safety_index: Rank 7 of 7
  4. health_care_index: Rank 6 of 7
  5. cost_of_living_index: Rank 1 of 7
  6. property_price_to_income_ratio: Rank 7 of 7
  7. traffic_commute_time_index: Rank 7 of 7
  8. climate_index: Rank 1 of 7
Output 76: Tier 5 countries

Cluster 2: Tier 4

Cluster 2 will be called “tier_4”. This cluster consists of countries that have bad all-around qualities they can offer from all factors. I also add the rank of the median value of each factor in this cluster.

The good part of this cluster is not as bad as Tier 5, but the characteristics are quite similar.

  1. purchasing_power_index: Rank 6 of 7
  2. pollution_index: Rank 6 of 7
  3. safety_index: Rank 5 of 7
  4. health_care_index: Rank 5 of 7
  5. cost_of_living_index: Rank 2 of 7
  6. property_price_to_income_ratio: Rank 6 of 7
  7. traffic_commute_time_index: Rank 6 of 7
  8. climate_index: Rank 6 of 7
Output 77: Tier 4 countries

Cluster 3: Tier 2 with Better Cost of Living, Purchase Power, Safety, Property Price to Income Ratio

Cluster 3 will be called “tier_2_betterCostPurchaseSafetyProperty”. This cluster consists of countries that have great all-around qualities they can offer from all factors, but compared to other tier_2, it has a better Cost of Living Index, Purchase Power Index, Safety Index, and Property Price to Income Ratio. If you realize, all countries in this cluster are located in Middle-East Asia. I also add the rank of the median value of each factor in this cluster.

The good part of this cluster is it has the best purchasing_power_index, safety_index, and property_price_to_income_ratio compared to others, BUT it has the worst climate_index.

  1. purchasing_power_index: Rank 1 of 7
  2. pollution_index: Rank 4 of 7
  3. safety_index: Rank 1 of 7
  4. health_care_index: Rank 4 of 7
  5. cost_of_living_index: Rank 4 of 7
  6. property_price_to_income_ratio: Rank 1 of 7
  7. traffic_commute_time_index: Rank 3 of 7
  8. climate_index: Rank 7 of 7
Output 78: Tier 2 with better Cost of Living Index, Purchase Power Index, Safety Index, and Property Price to Income Ratio countries

Cluster 4: Tier 2 with Better Pollution, Traffic Commute Time

Cluster 4 will be called “tier_2_betterPollutionTraffic”. This cluster consists of countries that have great all-around qualities they can offer from all factors, but compared to other tier_2, it has a better Pollution Index and Traffic Commute Time Index. If you realize, all countries in this cluster are located in Europe. I also add the rank of the median value of each factor in this cluster.

The good part of this cluster is it has the best traffic_commute_time_index. The bad part is the cost_of_living_index and property_price_to_income_ratio are not great.

  1. purchasing_power_index: Rank 4 of 7
  2. pollution_index: Rank 2 of 7
  3. safety_index: Rank 3 of 7
  4. health_care_index: Rank 3 of 7
  5. cost_of_living_index: Rank 5 of 7
  6. property_price_to_income_ratio: Rnk 5 of 7
  7. traffic_commute_time_index: Rank 1 of 7
  8. climate_index: Rank 4 of 7
Output 79: Tier 2 with better Pollution Index and Traffic Commute Time Index countries

Cluster 5: Tier 2 with Better Healthcare, Climate

Cluster 5 will be called “tier_2_betterHealthClimate”. This cluster consists of countries that have great all-around qualities they can offer from all factors, but compared to other tier_2, it has a better Healthcare Index and Climate Index. I also add the rank of the median value of each factor in this cluster.

The good part of this cluster is it has the best health_care_index and great climate_index. The bad part is the safety_index and cost_of_living_index are bad.

  1. purchasing_power_index: Rank 3 of 7
  2. pollution_index: Rank 3 of 7
  3. safety_index: Rank 6 of 7
  4. health_care_index: Rank 1 of 7
  5. cost_of_living_index: Rank 6 of 7
  6. property_price_to_income_ratio: Rank 3 of 7
  7. traffic_commute_time_index: Rank 5 of 7
  8. climate_index: Rank 2 of 7
Output 80: Tier 2 with better Healthcare Index and Climate Index countries

Cluster 6: Tier 3

Cluster 6 will be called “tier_3”. This cluster consists of countries that have not bad all-around qualities they can offer from all factors, not as good as Tier 1 and Tier 2, but also not as bad as Tier 4 and Tier 5. I also add the rank of the median value of each factor in this cluster.

The bad part of this cluster is it has the worst health_care_index, BUT the cost_of_living_index and climate_index are good enough.

  1. purchasing_power_index: Rank 5 of 7
  2. pollution_index: Rank 5 of 7
  3. safety_index: Rank 4 of 7
  4. health_care_index: Rank 7 of 7
  5. cost_of_living_index: Rank 3 of 7
  6. property_price_to_income_ratio: Rank 4 of 7
  7. traffic_commute_time_index: Rank 4 of 7
  8. climate_index: Rank 3 of 7
Output 81: Tier 3 countries

Additional data visualization: factors comparison for each cluster

Output 82: Boxplot of Purchasing Power Index for Each Cluster
Output 83: Boxplot of Cost of Living Index for Each Cluster
Output 84: Boxplot of Health Care Index for Each Cluster
Output 85: Boxplot of Safety Index for Each Cluster
Output 86: Boxplot of Pollution Index for Each Cluster
Output 87: Boxplot of Traffic Commute Time Index for Each Cluster
Output 88: Boxplot of Property Price to Income Ratio for Each Cluster
Output 89: Boxplot of Property Price to Income Ratio for Each Cluster

What we have done?

Phew, finally it’s done, that’s a long data analysis with a lot of data visualization, isn’t it? 😆

  1. We used the data from Numbeo.com that tells us about the quality_of_life_index including the various factors behind it.
  2. We used Python to do exploratory data analysis for each factor, including checking the correlation between each factor, to understand the data better.
  3. We created country clusters using the K-means algorithm with the Silhouette method to decide the optimal clusters (K).
  4. We explored the characteristics of each cluster, and the pros and cons.
  5. We generated proper methods and useful insights for the readers.

Finally, this is the final result, the rank of the cluster tiers.

Output 90: The country cluster ranking
  1. Tier 1
  2. Tier 2 with better Cost of Living Index, Purchasing Power Index, Safety Index, and Property Price to Income Ratio (5 countries)
  3. Tier 2 with better Pollution Index and Traffic Commute Time Index (8 countries)
  4. Tier 2 with better Health Care Index and Climate Index (14 countries)
  5. Tier 3 (17 countries)
  6. Tier 4 (14 countries)
  7. Tier 5 (13 countries)

In case you want to do the analysis above by yourself, the file and the Python code are stored in my GitHub.

I hope you enjoy this long post (and 90 outputs? lol) and find it useful, not only the data analysis part but also the generated insights.

References

  1. Pandas documentation https://pandas.pydata.org/docs/
  2. Stop Using Elbow Method in K-Means Clustering https://builtin.com/data-science/elbow-method
  3. Numbeo Quality of Life page https://www.numbeo.com/quality-of-life/
  4. Scikit Learn User Guide https://scikit-learn.org/stable/user_guide.html
  5. Numpy documentation https://numpy.org/doc/
  6. Google Search “best country to live” https://www.google.com/search?q=best+country+to+live

--

--

Mochamad Kautzar Ichramsyah
CodeX
Writer for

Data analytics professional with 10 years of experience at tech companies in Indonesia.