Exploratory Analysis of Car Evaluation Dataset with SQL

This project aims to conduct a comprehensive analysis of the car evaluation dataset in MySQL. This project focuses on identifying patterns and trends across various attributes to gain data-driven insights.

Photo by Alex Suprun on Unsplash

Dataset Overview

The data used in this project has been taken from the UCI Machine Learning Repository. The Car Evaluation Database was derived from a simple hierarchical decision model originally developed for the demonstration of DEX, M. Bohanec, and V. Rajkovic.

There are six attributes and one target column containing all the listed cars' information. There are no missing values in any of the dataset’s 1728 total instances.

The columns are:

buying → Buying price

maint → Price of the maintenance

doors → Number of doors

persons → Capacity in terms of persons to carry

lug_boot → the size of the luggage boot

safety → estimated safety of the car

class → Evaluation level (unacceptable, acceptable, good, very good)

Questions and Key Insights

I prepared some questions for my analysis.

How are the cars distributed in terms of buying price?

SELECT buying as Buying_Price, count(*) as Number_of_Cars
from car_evaluation
group by buying;

Result

Result based on buying price

How is the safety level distributed among the cars in the dataset?

SELECT Safety as Safety_Level, count(*) as Number_of_Cars
from car_evaluation
group by Safety;

Result

Result based on Safety Levels

How are the cars distributed in terms of maintenance costs?

SELECT maint as Maintenance_Cost, count(*) as Number_of_Cars
from car_evaluation
group by maint;

Result

Result based on maintenance cost

What is the distribution of the capacity of cars in terms of the number of people they can carry?

SELECT Persons as Passenger_Capacity, count(*) as Number_of_Cars
from car_evaluation
group by Persons;

Result

Result based on 2 and 4 seater cars

How does the size of the luggage boot vary across different cars?

SELECT Lug_boot as Luggage_Boot_Size, count(*) as Number_of_Cars
from car_evaluation
group by Lug_boot;

Result

Result based on boot size

What is the distribution of cars with high safety levels and a passenger capacity of 4, categorized by buying price and maintenance cost?

SELECT Buying AS Buying_Price, Maint AS Maintenance_Cost , count(*) as Number_of_Cars
FROM car_evaluation
WHERE safety = 'high' AND persons = 4
GROUP BY Buying, Maint;

Result

Buying price + Maintenance cost Vs No. of cars

Are there specific buying price and maintenance cost ranges associated with more favorable evaluation levels?

select buying as Buying_Price, maint as Maintenance_Cost,
avg(case when class = 'good' or class = 'vgood' then 1 else 0 end) as Favorable_Evaluation_Percentage
from car_evaluation
group by buying, maint
order by Favorable_Evaluation_Percentage desc;

Result

Key Insights

  1. Low and mid-range buying prices have more demand.
  2. Despite high maintenance costs, cars are in demand on the market for their inexpensive price.

Which combination of buying price and maintenance cost seems to be most important in determining whether a car is evaluated as “unacceptable”?

select buying as Buying_Price, maint as Maintenance_Cost,
avg(CASE WHEN class = 'unacc' THEN 1 ELSE 0 END) as Unacceptable_Rate
from car_evaluation
group by buying,maint;

Result

Key Insights

Whenever there is a high or very high buying price, maintenance cost, or both, it has the highest unaccepted rate.

How many cars in the dataset meet the dream car specifications, which include a buying price of ‘med’ or ‘low’, maintenance cost of ‘med’ or ‘low’, high safety level, a capacity for four persons, four doors, a luggage boot size of ‘med’ or ‘big’, and an evaluation class of ‘vgood’ or ‘good’?

select count(*) as Number_of_Cars
from car_evaluation
where
buying in ('med','low')
and maint in ('med','low')
and safety = 'high'
and persons = 4
and Doors = 4
and lug_boot in ('med', 'big')
and class in ('vgood', 'good')

Result

No of ‘Dream Car’

In Closing

In this analysis, I gained some key insights into the distribution of cars based on various attributes. Additionally, I also explored the associations between buying price, maintenance cost, and evaluation levels, emphasizing safety, passenger capacity, and the customer’s choice.

You can find the codes → Github

--

--