Beyond numbers: Numbeo’s insight into the world’s most livable countries (Python, K-Means with Silhouette method)
Introduction to Numbeo’s quality of live index
As explained in this page https://www.numbeo.com/quality-of-life/indices_explained.jsp :
The quality of life index estimates a city or country’s overall quality of life.
In this case, Numbeo is using various factors to measure the quality of life index, such as the purchasing power index, safety index, healthcare index, cost of living index, property price to income ratio, traffic commute time index, pollution index, and climate index.
It’s important to note that the quality of life index is based on data and surveys collected by Numbeo and we will use the 2023 index in this opportunity on the country level. For further explanation about the quality of life index and the various factors behind it:
- Quality of life index (higher is better). It’s a made-up value by Numbeo based on the various factors below using a formula. That’s why we can say that there is a causation between the quality of life index and various factors below, in case there is a correlation between those numbers. https://www.numbeo.com/quality-of-life/indices_explained.jsp
- Purchasing power index (higher is better) and cost of living index (lower is better) https://www.numbeo.com/cost-of-living/cpi_explained.jsp
- Pollution index (lower is better) https://www.numbeo.com/pollution/indices_explained.jsp
- Property price to income ratio (lower is better) https://www.numbeo.com/property-investment/indicators_explained.jsp
- Safety index (higher is better) https://www.numbeo.com/crime/indices_explained.jsp
- Healthcare index (higher is better) https://www.numbeo.com/health-care/indices_explained.jsp
- Traffic commute time index (lower is better) https://www.numbeo.com/traffic/indices_explained.jsp
- Climate index (higher is better) https://www.numbeo.com/climate/indices_explained.jsp
For the data source, I am using the data on this page https://www.numbeo.com/quality-of-life/rankings_by_country.jsp?title=2023-mid and to do the data analysis I am using Python. Without further ado, let’s go!
Quality of life index and rankings based on Numbeo’s calculation
Based on Output 1, we have 84 countries listed, including the various factors mentioned in the previous chapter.
Based on Output 2, the top 5 countries with the highest quality_of_life_index
are Luxembourg and the Netherlands (equal 200.1), Iceland (191.1), Denmark (190.6), and Finland (188.1). Then, the bottom 5 countries with the lowest quality_of_life_index
are Nigeria (49.5), Bangladesh (69.5), Venezuela (74.4), Sri Lanka (76.5), and Iran (77.6)
In case you want to know a specific country, you can use this code:
Based on Output 3, we can see that Indonesia placed in rank 74 of 84 countries listed with a 92.0 quality of life index.
Based on Output 5 and Output 6,
- It shows that there are no outlier values in the
quality_of_life_index
. - The average and median (percentile 50)
quality_of_life_index
are 134.27 and 131.5 respectively, which is not too different - The most common
quality_of_life_index
(values between percentile 25 and percentile 75, shown by the blue box of the boxplot) is 106.47 to 164.57. - The highest and lowest
quality_of_life_index
is 200.1 and 35.34 respectively (proving Luxembourg, the Netherlands, and Nigeria’s quality of life index previously)
Based on Output 7, we can see that Uruguay (ranked 41, a 136.2 quality_of_life_index
) and Hungary (ranked 42, a 131.6 quality_of_life_index
) are the countries that separate between the higher and lower than the average quality_of_life_index
.
After extracting all the information above, we can’t say that the countries with a lower quality of life index than the average are bad. That’s why we need to explore the various factors that build the quality of life index.
Understanding the data behind Numbeo’s quality of life rankings
The power of the purse: Exploring quality living through the purchasing power index
In this part, we focus on exploring the purchasing_power_index
(higher is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index
.
Based on Output 8, we can see that the country list has changed. The top 5 countries with the highest purchasing_power_index
are Luxembourg (133.2), Qatar (120.2), UAE (118.0), the United States (117.7), and Switzerland (110.8). Then, the bottom 5 countries with the lowest purchasing_power_index
are Nigeria (9.4), Venezuela (11.3), Sri Lanka(74.4), Egypt (17.4), and Lebanon (19.4).
- Luxembourg constantly appeared in the top 5 countries based on the
quality_of_life_index
andpurchasing_power_index
. - Nigeria, Venezuela, and Sri Lanka constantly appeared in the bottom 5 countries based on the
quality_of_life_index
andpurchasing_power_index
.
Based on those facts, I think there is some correlation between the quality_of_life_index
and the purchasing_power_index
.
Based on Output 9, there is a strong positive linear correlation (0.87) between the quality_of_life_index
and the purchasing_power_index
. It means the higher the purchasing_power_index
affects to higher quality_of_life_index
(causation).
We can confirm its causation because we know that Numbeo calculates the quality_of_life_index based on the purchasing_power_index as one of the factors.
Based on Output 10 and Output 11,
- It shows that there are no outlier values in the
purchasing_power_index
. - The average and median (percentile 50)
purchasing_power_index
are 59.14 and 52.15 respectively, which is not too different - The most common
purchasing_power_index
(values between percentile 25 and percentile 75, shown by the sky-blue-colored box of the boxplot) is 32.95 to 86.25. - The highest and lowest
purchasing_power_index
is 133.2 and 9.4 respectively (proving Luxembourg and Nigeria’spurchasing_power_index
previously)
Safe haven: A deep dive into safety Index and its impact on quality of life
In this part, we focus on exploring the safety_index
(higher is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index
and the correlation with purchasing_power_index
.
Based on Output 12, we can see that the country list has changed. The top 5 countries with the highest safety_index
are Qatar (85.7), UAE (85.4), Taiwan (83.9), Oman (80.4), and Hong Kong (78.3). Then, the bottom 5 countries with the lowest safety_index
are Venezuela (17.9), South Africa (24.5), Peru (32.5), Brazil (33.9), and Nigeria (34.2).
- Qatar and UAE constantly appeared in the top 5 countries based on the
purchasing_power_index
andsafety_index
. - Venezuela and Sri Lanka constantly appeared in the bottom 5 countries based on the
quality_of_life_index
,purchasing_power_index
, andsafety_index
. - We can see that the top 5 countries came from the Asia region. Otherwise, the bottom 5 countries came from Africa and South America region.
Based on those facts, I think there is some correlation between the quality_of_life_index
, purchasing_power_index
, and safety_index
.
Based on Output 13 and Output 14, there is a moderate positive linear correlation (0.57) and (0.51) between the safety_index
and the quality_of_life_index
also purchasing_power_index
respectively. It means the higher the safety_index
affects to higher quality_of_life_index
(causation). For some unknown reasons, safety_index
also positively correlated with purchasing_power_index
, we can’t say there is causation, but logically when a country, city, or area, with high purchasing_power_index
I think it indirectly affects the safety_index
because the crime should be lower than other areas that have low purchasing_power_index
.
Based on Output 15 and Output 16,
- It shows that there are outlier values in the
safety_index
which is labeled by the two countries with the lowestsafety_index
, Venezuela and South Africa. - The average and median (percentile 50)
safety_index
are 59.6 and 59.55 respectively, which is not too different. - The most common
safety_index
(values between percentile 25 and percentile 75, shown by the red-colored box of the boxplot) is 52.85 to 71.2. - The highest and lowest
safety_index
are 85.7 and 17.9 respectively (proving Qatar and Venezuela’ssafety_index
previously)
Preserving wellness: Navigating the healthcare index landscape
In this part, we focus on exploring the health_care_index
(higher is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index
and the correlation with purchasing_power_index
and safety_index
.
Based on Output 17, we can see that the country list has changed. The top 5 countries with the highest health_care_index
are Taiwan (85.9), South Korea (83.0), Japan (79.6), France (78.8), and the Netherlands (78.6). Then, the bottom 5 countries with the lowest health_care_index
are Venezuela (39.2), Bangladesh (42.0), Morocco (45.4), Azerbaijan (47.5), and Belarus (47.7).
- The top 3 countries of
health_care_index
came from the Asia region. - Venezuela is the only country constantly appearing in the bottom 5 countries based on the
quality_of_life_index
,purchasing_power_index
,safety_index
, andhealth_care_index
.
Based on Output 18, 19, and 20, there is a moderate positive linear correlation (0.62), (0.58), and (0.41) between the health_care_index
and the quality_of_life_index
, purchasing_power_index
, and safety_index
respectively. It means the higher the health_care_index
affects to higher quality_of_life_index
(causation). For some unknown reasons, health_care_index
also positively correlated with purchasing_power_index
and safety_index
, we can’t say there must be a causation.
Based on Output 21 and Output 22,
- It shows that there are no outlier values in the
health_care_index
. - The average and median (percentile 50)
health_care_index
are 64.75 and 65.75 respectively, which is not too different. - The most common
health_care_index
(values between percentile 25 and percentile 75, shown by the green-colored box of the boxplot) is 57.87 to 72.37. - The highest and lowest
health_care_index
is 85.9 and 39.2 respectively (proving Taiwan and Venezuela’shealth_care_index
previously)
Balancing the budget: Demystifying the cost of living index
In this part, we focus on exploring the cost_of_living_index
(lower is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index
and the correlation with purchasing_power_index
, safety_index
, and health_care_index
.
Based on Output 23, we can see that the country list has changed. The top 5 countries with the lowest cost_of_living_index
(lower are better) are Pakistan (17.6), Egypt (21.7), India (22.9), Nigeria (23.2), and Bangladesh (26.2). Then, the bottom 5 countries with the highest cost_of_living_index
are Switzerland (117.3), Iceland (87.7), Singapore (85.9), Norway (82.2), and Denmark (79.2).
- The top 5 countries of
cost_of_living_index
came from Asia and Africa region. Otherwise, the bottom 5 countries came from Europe and Singapore as the most expensive to live in Asia region. - The low cost of living does not always mean it’s a good thing. We have to check the correlation with the other factors to know if is it a good thing or not.
Based on Output 24, 25, 26, and 27,
- Strong positive linear correlation (0.75) and (0.76) between the
cost_of_living_index
and thequality_of_life_index
andpurchasing_power_index
respectively. It means the higher thecost_of_living_index
affects to higherquality_of_life_index
(causation). For some unknown reasons, thecost_of_living_index
also positively correlated with thepurchasing_power_index
, logically using economic principle, if the demand (purchasing_power) is strong, then the supply (cost_of_living) is strong too. - Moderate positive linear correlation (0.44) and (0.53) between the
cost_of_living_index
and thesafety_index
andhealth_care_index
respectively.
Based on Output 28 and Output 29,
- It shows that there is an outlier value in the
cost_of_living_index
which is located in Switzerland. - The average and median (percentile 50)
cost_of_living_index
are 49.69 and 47.85 respectively, which is not too different. - The most common
cost_of_living_index
(values between percentile 25 and percentile 75, shown by the yellow-colored box of the boxplot) is 34.45 to 61.65. - The highest and lowest
cost_of_living_index
is 117.3 and 17.6 respectively (proving Switzerland and Pakistan’scost_of_living_index
previously)
Home sweet home: Understanding property price to income ratio
In this part, we focus on exploring the property_price_to_income_ratio
(lower is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index
and the correlation with purchasing_power_index
, safety_index
, health_care_index
, and cost_of_living_index
.
Based on Output 30, we can see that the country list has changed. The top 5 countries with the lowest property_price_to_income_ratio
(lower are better) are Saudi Arabia (2.9), Oman (3.2), UAE (3.3), South Africa (3.3), and the United States (4.2). Then, the bottom 5 countries with the highest property_price_to_income_ratio
are Hong Kong (42.1), Sri Lanka (35.3), China (33.0), Philippines (31.0), and Thailand (25.8).
- The top 3 countries of
property_price_to_income_ratio
came from Middle-East Asia. - Similarly, the bottom 5 countries came from the Asia region. We can assume that the
property_price_to_income_ratio
is great if we choose Middle-East Asia, but it’s contrary to East Asia and South-East Asia. - The low property price to income ratio does not always mean it’s a good thing. We have to check the correlation with the other factors to know if is it a good thing or not.
Based on Output 31, 32, 33, 34, and 35,
- Moderate negative linear correlation (-0.62) and (-0.52) between the
property_price_to_income_ratio
and thequality_of_life_index
andpurchasing_power_index
respectively. It means the lower theproperty_price_to_income_ratio
affects to higherquality_of_life_index
(causation). For some unknown reasons, theproperty_price_to_income_ratio
also negatively correlated with thepurchasing_power_index
. It’s quite logical to understand when we have higher purchasing power, our capability to buy properties is getting higher too (lower ratio). - Weak negative linear correlation (-0.29) between the
property_price_to_income_ratio
and thecost_of_living_index
. - No linear correlation (-0.08) and (-0.07) between the
property_price_to_income_ratio
and thesafety_index
andhealthcare_index
.
Based on Output 36 and Output 37,
- It shows that there are outlier values in the
property_price_to_income_ratio
. The highest one is Hong Kong, followed by Sri Lanka, China, the Philippines, Thailand, Lebanon, and Vietnam. - The average and median (percentile 50)
property_price_to_income_ratio
are 13.25 and 11.7 respectively, the gap is quite big because there are some outliers. - The most common
property_price_to_income_ratio
(values between percentile 25 and percentile 75, shown by the orange-colored box of the boxplot) is 9.05 to 14.95. - The highest and lowest
property_price_to_income_ratio
is 42.1 and 2.9 respectively (proving Hong Kong and Saudi Arabia’sproperty_price_to_income_ratio
previously)
Navigating the daily grind: Analyzing the traffic commute time index
In this part, we focus on exploring the traffic_commute_time_index
(lower is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index
and the correlation with purchasing_power_index
, safety_index
, health_care_index
, cost_of_living_index
, and property_price_to_income_ratio
.
Based on Output 38, we can see that the country list has changed. The top 5 countries with the lowest traffic_commute_time_index
(lower are better) are Estonia (22.2), Iceland, Oman (equally 22.3), Cyprus (22.8), and the Netherlands (23.8). Then, the bottom 5 countries with the highest traffic_commute_time_index
are Nigeria (62.8), Bangladesh (57.4), Sri Lanka (56.4), Kenya (51.6), and Peru (49.1).
- The top 3 countries of
traffic_commute_time_index
came from Northern Europe. - The bottom 4 countries of the
traffic_commute_time_index
are dominated by Africa and South Asia.
Based on Output 39, 40, 41, 42, 43, and 44,
- Strong negative linear correlation (-0.72) between the
traffic_commute_time_index
and thequality_of_life_index
. It means the lower thetraffic_commute_time_index
affects to higherquality_of_life_index
(causation). - Moderate negative linear correlation (-0.50), (-0.51), and (-0.48) between the
traffic_commute_time_index
and thepurchasing_power_index
,safety_index
, andcost_of_living_index
respectively. - Weak negative linear correlation (-0.26) between the
traffic_commute_time_index
and thehealth_care_index
. - Moderate positive linear correlation (0.46) between the
traffic_commute_time_index
and theproperty_price_to_income_ratio
.
Based on Output 45 and Output 46,
- It shows that there are outlier values in the
traffic_commute_time_index
. The highest one is Nigeria, followed by Bangladesh and Sri Lanka. - The average and median (percentile 50)
traffic_commute_time_index
are 35.32 and 35.1 respectively, with no significant difference. - The most common
traffic_commute_time_index
(values between percentile 25 and percentile 75, shown by the grey-colored box of the boxplot) is 29.2 to 39.42. - The highest and lowest
traffic_commute_time_index
is 62.8 and 22.2 respectively (proving Nigeria and Estonia’straffic_commute_time_index
previously)
Breathing easy: Insights into the pollution index
In this part, we focus on exploring the pollution_index
(lower is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index
and the correlation with purchasing_power_index
, safety_index
, health_care_index
, cost_of_living_index
, property_price_to_income_ratio
, and traffic_commute_time_index
.
Based on Output 47, we can see that the country list has changed. The top 5 countries with the lowest pollution_index
(lower are better) are Finland (11.8), Iceland (15.7), Estonia (17.1), Sweden (17.9), and Norway (18.0). Then, the bottom 5 countries with the highest pollution_index
are Lebanon (89.4), Nigeria (88.2), Bangladesh (85.2), Vietnam (84.2), and Peru (83.0).
- The top 5 countries of
pollution_index
came from Northern Europe. It must be nice to live in a region with the lowest pollution :) - The 3 of the bottom 5 countries of
pollution_index
came from the Asia region.
Based on Output 48, 49, 50, 51, 52, 53, and 54,
- Strong negative linear correlation (-0.89) between the
pollution_index
and thequality_of_life_index
andcost_of_living_index.
It means the lower thepollution_index
affects to higherquality_of_life_index
(causation). At the same time, the high value ofpollution_index
usually happens in a country with a low value ofcost_of_living_index
. - Moderate negative linear correlation (-0.67) and (-0.54) between the
pollution_index
and thepurchasing_power_index
andhealth_care_index
respectively. - Weak negative linear correlation (-0.38) between the
pollution_index
and thesafety_index
. - Moderate positive linear correlation (0.6) and (0.46) between the
pollution_index
and thetraffic_commute_time_index
andproperty_price_to_income_ratio
respectively.
Based on Output 55 and Output 56,
- It shows that there is no outlier value in the
pollution_index
. - The average and median (percentile 50)
pollution_index
are 52.68 and 56.8 respectively, with no significant difference. - The most common
pollution_index
(values between percentile 25 and percentile 75, shown by the purple-colored box of the boxplot) is 35.07 to 68.45. - The highest and lowest
pollution_index
are 89.4 and 11.8 respectively (proving Lebanon and Finland’spollution_index
previously)
Climate comfort: The climate index chronicles how weather influences the quality of living
In this part, we focus on exploring the climate_index
(higher is better) feature in the dataset as the base variable to check the distribution and its impact on quality_of_life_index
and the correlation with purchasing_power_index
, safety_index
, health_care_index
, cost_of_living_index
, property_price_to_income_ratio
, traffic_commute_time_index
, and pollution_index
.
Based on Output 57, we can see that the country list has changed. The top 5 countries with the highest climate_index
(higher are better) are Venezuela (99.9), Kenya (99.8), Argentina (98.3), Uruguay (98.0), and Portugal (97.8). Then, the bottom 5 countries with the lowest climate_index
are Kuwait (20.2), Qatar (36.0), Kazakhstan (39.8), Saudi Arabia (41.4), and UAE (45.8).
- The top 5 countries of
climate_index
are dominated by the South America region. - The bottom 5 countries of
climate_index
came from the Middle-East Asia region.
Based on Output 58, 59, 60, 61, 62, 63, 64, and 65,
- Weak negative linear correlation (-0.38) and (-0.23) between the
climate_index
and thesafety_index
andpurchasing_power_index
. - No linear correlation was detected from other factors.
Based on Output 66 and Output 67,
- It shows that in
climate_index
Kuwait and Qatar are labeled as bottom outlier values. - The average and median (percentile 50)
climate_index
are 77.77 and 80.7 respectively. - The most common
climate_index
(values between percentile 25 and percentile 75, shown by the pink-colored box of the boxplot) is 68.62 to 90.22. - The highest and lowest
climate_index
are 99.9 and 20.2 respectively (proving Venezuela and Kuwait’sclimate_index
previously)
Summarize the correlation between each factor
Based on Output 68 and 69,
- Only
climate_index
as a building factor does not correlate withquality_of_life_index
. - The building factors that have a strong correlation to each other are
purchasing_power_index vs. cost_of_living_index
andpurchasing_power_index vs. pollution_index
- The positive correlation between
quality_of_life_index
and the building factors arepurchasing_power_index
: strong (0.87),cost_of_living_index
: strong (0.75),health_care_index
: moderate (0.62), andsafety_index
: moderate (0.57) - The negative correlation between
quality_of_life_index
and the building factors arepollution_index
: strong (-0.89),traffic_commute_time_index
: strong (-0.72),property_price_to_income_ratio
: moderate (-0.62), andclimate_index
: none (-0.02)
Clustering the countries
We know that the rank based on the quality_of_life_index
provided by Numbeo uses this formula below:
index.main = Math.max(0, 100 + purchasingPowerInclRentIndex / 2.5 — (housePriceToIncomeRatio * 1.0) — costOfLivingIndex / 10 + safetyIndex / 2.0 + healthIndex / 2.5 — trafficTimeIndex / 2.0 — pollutionIndex * 2.0 / 3.0 + climateIndex / 3.0);
But, we know that climate_index
does not even correlate with the quality_of_life_index
, that’s why in my opinion, we can create a better one. In this case, I will use K-means clustering to get better groupings of the countries, not only rankings, so we can have alternatives for similar countries based on the factors we have.
Silhoutte method to decide the best K to be used
- For a given value of K, it is expected that all clusters possess a Silhouette score surpassing the average score denoted by the red-dotted line, as depicted on the x-axis. Clusters corresponding to K = 8 and K = 9 are excluded from consideration as they do not adhere to this criterion.
- Consistency in the cluster sizes is preferred, and wide variations are discouraged. The width of clusters, indicative of the number of data points they contain, exhibits considerable disparity for K values of 2, 3, and 4 compared to other clusters. Hence, my preference leans toward selecting from the options of K = 5, 6, and 7.
- Based on Output 70, upon further exploration of the data, I concluded that K = 7 is the most suitable choice for the optimal number of clusters.
Implementing K-Means clustering with K = 7
Below is the snapshot of the pair plot after we implement the K-Means clustering with K = 7.
Based on Output 71, 72, 73, and 74, we can give a proper name for each cluster that describes the characteristic.
Cluster 0: Tier 1
Cluster 0 will be called “tier_1”. This cluster consists of countries that have the best all-round qualities they can offer from all factors. I also add the rank of the median value of each factor in this cluster.
The bad part of this cluster is it has the worst
cost_of_living_index
and also not a goodclimate_index
.
purchasing_power_index
: Rank 2 of 7pollution_index
: Rank 1 of 7safety_index
: Rank 2 of 7health_care_index
: Rank 2 of 7cost_of_living_index
: Rank 7 of 7property_price_to_income_ratio
: Rank 2 of 7traffic_commute_time_index
: Rank 2 of 7climate_index
: Rank 5 of 7
Cluster 1: Tier 5
Cluster 1 will be called “tier_5”. This cluster consists of countries that have the worst all-around qualities they can offer from all factors. I also add the rank of the median value of each factor in this cluster.
The good part of this cluster is it has the best
cost_of_living_index
andclimate_index
. Contradictory to Tier 1.
purchasing_power_index
: Rank 7 of 7pollution_index
: Rank 7 of 7safety_index
: Rank 7 of 7health_care_index
: Rank 6 of 7cost_of_living_index
: Rank 1 of 7property_price_to_income_ratio
: Rank 7 of 7traffic_commute_time_index
: Rank 7 of 7climate_index
: Rank 1 of 7
Cluster 2: Tier 4
Cluster 2 will be called “tier_4”. This cluster consists of countries that have bad all-around qualities they can offer from all factors. I also add the rank of the median value of each factor in this cluster.
The good part of this cluster is not as bad as Tier 5, but the characteristics are quite similar.
purchasing_power_index
: Rank 6 of 7pollution_index
: Rank 6 of 7safety_index
: Rank 5 of 7health_care_index
: Rank 5 of 7cost_of_living_index
: Rank 2 of 7property_price_to_income_ratio
: Rank 6 of 7traffic_commute_time_index
: Rank 6 of 7climate_index
: Rank 6 of 7
Cluster 3: Tier 2 with Better Cost of Living, Purchase Power, Safety, Property Price to Income Ratio
Cluster 3 will be called “tier_2_betterCostPurchaseSafetyProperty”. This cluster consists of countries that have great all-around qualities they can offer from all factors, but compared to other tier_2, it has a better Cost of Living Index, Purchase Power Index, Safety Index, and Property Price to Income Ratio. If you realize, all countries in this cluster are located in Middle-East Asia. I also add the rank of the median value of each factor in this cluster.
The good part of this cluster is it has the best purchasing_power_index, safety_index, and property_price_to_income_ratio compared to others, BUT it has the worst climate_index.
purchasing_power_index
: Rank 1 of 7pollution_index
: Rank 4 of 7safety_index
: Rank 1 of 7health_care_index
: Rank 4 of 7cost_of_living_index
: Rank 4 of 7property_price_to_income_ratio
: Rank 1 of 7traffic_commute_time_index
: Rank 3 of 7climate_index
: Rank 7 of 7
Cluster 4: Tier 2 with Better Pollution, Traffic Commute Time
Cluster 4 will be called “tier_2_betterPollutionTraffic”. This cluster consists of countries that have great all-around qualities they can offer from all factors, but compared to other tier_2, it has a better Pollution Index and Traffic Commute Time Index. If you realize, all countries in this cluster are located in Europe. I also add the rank of the median value of each factor in this cluster.
The good part of this cluster is it has the best traffic_commute_time_index. The bad part is the cost_of_living_index and property_price_to_income_ratio are not great.
purchasing_power_index
: Rank 4 of 7pollution_index
: Rank 2 of 7safety_index
: Rank 3 of 7health_care_index
: Rank 3 of 7cost_of_living_index
: Rank 5 of 7property_price_to_income_ratio
: Rnk 5 of 7traffic_commute_time_index
: Rank 1 of 7climate_index
: Rank 4 of 7
Cluster 5: Tier 2 with Better Healthcare, Climate
Cluster 5 will be called “tier_2_betterHealthClimate”. This cluster consists of countries that have great all-around qualities they can offer from all factors, but compared to other tier_2, it has a better Healthcare Index and Climate Index. I also add the rank of the median value of each factor in this cluster.
The good part of this cluster is it has the best health_care_index and great climate_index. The bad part is the safety_index and cost_of_living_index are bad.
purchasing_power_index
: Rank 3 of 7pollution_index
: Rank 3 of 7safety_index
: Rank 6 of 7health_care_index
: Rank 1 of 7cost_of_living_index
: Rank 6 of 7property_price_to_income_ratio
: Rank 3 of 7traffic_commute_time_index
: Rank 5 of 7climate_index
: Rank 2 of 7
Cluster 6: Tier 3
Cluster 6 will be called “tier_3”. This cluster consists of countries that have not bad all-around qualities they can offer from all factors, not as good as Tier 1 and Tier 2, but also not as bad as Tier 4 and Tier 5. I also add the rank of the median value of each factor in this cluster.
The bad part of this cluster is it has the worst health_care_index, BUT the cost_of_living_index and climate_index are good enough.
purchasing_power_index
: Rank 5 of 7pollution_index
: Rank 5 of 7safety_index
: Rank 4 of 7health_care_index
: Rank 7 of 7cost_of_living_index
: Rank 3 of 7property_price_to_income_ratio
: Rank 4 of 7traffic_commute_time_index
: Rank 4 of 7climate_index
: Rank 3 of 7
Additional data visualization: factors comparison for each cluster
What we have done?
Phew, finally it’s done, that’s a long data analysis with a lot of data visualization, isn’t it? 😆
- We used the data from Numbeo.com that tells us about the
quality_of_life_index
including the various factors behind it. - We used Python to do exploratory data analysis for each factor, including checking the correlation between each factor, to understand the data better.
- We created country clusters using the K-means algorithm with the Silhouette method to decide the optimal clusters (K).
- We explored the characteristics of each cluster, and the pros and cons.
- We generated proper methods and useful insights for the readers.
Finally, this is the final result, the rank of the cluster tiers.
- Tier 1
- Tier 2 with better Cost of Living Index, Purchasing Power Index, Safety Index, and Property Price to Income Ratio (5 countries)
- Tier 2 with better Pollution Index and Traffic Commute Time Index (8 countries)
- Tier 2 with better Health Care Index and Climate Index (14 countries)
- Tier 3 (17 countries)
- Tier 4 (14 countries)
- Tier 5 (13 countries)
In case you want to do the analysis above by yourself, the file and the Python code are stored in my GitHub.
I hope you enjoy this long post (and 90 outputs? lol) and find it useful, not only the data analysis part but also the generated insights.
References
- Pandas documentation https://pandas.pydata.org/docs/
- Stop Using Elbow Method in K-Means Clustering https://builtin.com/data-science/elbow-method
- Numbeo Quality of Life page https://www.numbeo.com/quality-of-life/
- Scikit Learn User Guide https://scikit-learn.org/stable/user_guide.html
- Numpy documentation https://numpy.org/doc/
- Google Search “best country to live” https://www.google.com/search?q=best+country+to+live