Harnessing Data Analysis for Eco-friendly Vehicle Recommendations in Canada

Nana Kwame Owusu-Afriyie
All Things Tech Hub
3 min readMar 21, 2023

--

As a volunteer for a public policy advocacy organization in Canada, I was tasked with helping my colleague draft recommendations for guidelines on CO2 emissions rules. Using Python and SQL, I conducted exploratory data analysis on CO2 emissions data for Canadian vehicles and market information for a chain of bicycle stores.

Python Analysis: Analyzing CO2 Emissions Data

Dataset

I had access to seven years of CO2 emissions data for Canadian vehicles, sourced from the Government of Canada’s open data portal here. The dataset contains information on vehicle make, model, class, engine size, cylinders, transmission, fuel type, fuel consumption, and CO2 emissions.

Key Findings

  1. The median engine size in liters for the vehicles in the dataset is 231.0.
  2. The average fuel consumption for each fuel type is as follows:
  • Regular gasoline (X): 10.10 L/100 km
  • Premium gasoline (Z): 11.42 L/100 km
  • Ethanol (E): 16.86 L/100 km
  • Diesel (D): 8.84 L/100 km

3. The correlation between fuel consumption and CO2 emissions is 0.92.

4. The average CO2 emissions for ‘SUV — SMALL’ and ‘MID-SIZE’ vehicle classes are 236.29 g/km and 222.46 g/km, respectively, making the MID-SIZE vehicle, the SUV class with the lower average CO2 emissions.

5. The average CO2 emissions for all vehicles in the dataset is 250.58 g/km, while the average for vehicles with an engine size of 2.0 liters or smaller is 198.27 g/km.

During the exploration, I visualized the impact analysis of the number of cylinders on fuel consumption and CO2 emissions using a scatter plot.

From the visualization, it is clear that there is a strong, positive correlation between the number of cylinders and fuel consumption (0.78) and that of cylinders and CO2 emissions (0.83).

Finally, I used the seaborn library to create a heat map of the vehicle classes versus the fuel type

Check my workspace here for the python code that generated these visualizations

SQL Analysis: Understanding the Bicycle Market

Dataset

I was given access to three tables containing information on bicycle products, brands, and categories for a chain of bicycle stores. The goal was to help my new team leader better understand the bicycle market.

Key Findings

  1. The most expensive item sold by the company is Trek Domane SLR 9 Disc — 2018, while the least expensive item is Strider Classic 12 Balance Bike — 2018.
  2. The number of different products in each category is as follows:

3. The top three brands with the highest average list price are:

  • [Trek]: [2500.064074]
  • [Heller]: [2172.996666]
  • [Surly]: [1331.753600]

4. The top three categories with the highest average list price are:

  • [Cruisers Bicycles]: [78]
  • [Mountain Bikes]: [60]
  • [Road Bikes]: [60]

Conclusion

Using Python and SQL, I was able to extract valuable insights on CO2 emissions data for Canadian vehicles and understand the bicycle market. This information can be used to guide public policy on emission regulations and help my team leader navigate the bicycle market effectively.

I’d appreciate it if you could check out my DataCamp workspace and give it an upvote if you find my work insightful. Your support helps me gain recognition in the competition and contributes to my ongoing learning journey.

If you found my analysis insightful and are in need of data analysis services, I invite you to check out my Fiverr store. As a professional data analyst, I offer a range of data analysis solutions tailored to meet the needs of businesses, organizations, and individuals. Feel free to reach out to me on Fiverr to discuss how I can help you uncover valuable insights from your data.

Thank you for your support, and I look forward to sharing more data-driven insights and experiences with you in the future.

--

--