Predictions Using sklearn: Regression for Car Mileage and Diamond Price

Dec 9, 2023


In the rapidly evolving field of data science and machine learning, predictive modeling plays a crucial role in extracting valuable insights from datasets. This lab focuses on regression, a type of predictive modeling that aims to establish a relationship between variables. Specifically, we’ll explore regression in the context of predicting car mileage and diamond prices using Python and popular libraries like Pandas and Scikit-learn.


Before delving into the exercises, ensure you have the necessary libraries installed. The following libraries are required for this lab:

  • Pandas for data management.
  • Scikit-learn for machine learning functionalities.
  • NumPy for numerical operations.
!pip install pandas==1.3.4
!pip install scikit-learn==1.0.2
!pip install numpy==1.21.6

Loading Car Mileage Dataset

Let’s start with the car mileage dataset. The data is loaded from a CSV file, and we’ll perform some initial exploratory analysis.

import pandas as pd

# URL for the car mileage dataset
URL = ""

# Load the data into a Pandas DataFrame
df = pd.read_csv(URL)

# Display sample rows from the dataset

The dataset comprises columns such as MPG, Cylinders, Engine Disp, Horsepower, Weight, Accelerate, Year, and Origin. Let’s visualize the relationship between Horsepower and mileage using a scatter plot:

df.plot.scatter(x="Horsepower", y="MPG")

Identifying Target and Features

Now, let’s identify the target and feature columns for building a Linear Regression model. In this case, we’ll predict car mileage (MPG) based on Horsepower and Weight.

# Identify the target column
target = df["MPG"]

# Identify the features
features = df[["Horsepower", "Weight"]]

Building and Training a Linear Regression Model

With the target and features identified, we can proceed to build and train the Linear Regression model.

from sklearn.linear_model import LinearRegression

# Create a Linear Regression model
lr = LinearRegression()

# Train the model, target)

Evaluating the Model and Making Predictions

Now that the model is trained, let’s evaluate its performance and make predictions.

# Evaluate the model
model_score = lr.score(features, target)
print("Model Score:", model_score)

# Make predictions for a car with Horsepower = 100 and Weight = 2000
predictions = lr.predict([[100, 2000]])
print("Predicted MPG:", predictions[0])

The model’s score indicates its performance, and the predictions provide estimated MPG for specific input values.

Exercises: Diamond Price Prediction

Now, let’s apply similar concepts to a different dataset — the diamonds dataset.

Exercise 1: Loading the Diamond Dataset

# URL for the diamonds dataset
URL2 = ""

# Load the diamond dataset into a Pandas DataFrame
df2 = pd.read_csv(URL2)

Exercise 2: Identifying Target and Features

Identify the target column as “price” and the features as “carat” and “depth.”

# Identify the target column
target = df2["price"]

# Identify the features
features = df2[["carat", "depth"]]

Exercise 3: Build and Train a New Linear Regression Model

Build and train a new Linear Regression model for diamond price prediction.

# Create a new Linear Regression model
lr2 = LinearRegression()

# Train the model, target)

Exercise 4: Evaluate the Model

Print the score of the model to assess its performance.

# Print the score of the model
model_score_diamonds = lr2.score(features, target)
print("Model Score for Diamond Price Prediction:", model_score_diamonds)

Exercise 5: Predict the Price of a Diamond

Predict the price of a diamond with carat = 0.3 and depth = 60.

# Make predictions for diamond price
diamond_predictions = lr2.predict([[0.3, 60]])
print("Predicted Diamond Price:", diamond_predictions[0])


Congratulations! You’ve completed this lab, gaining valuable experience in regression modeling for predicting car mileage and diamond prices. These skills are essential for various applications, from automotive industry insights to pricing strategies in the diamond market. Continue experimenting with models involving more feature columns to further enhance your understanding and capabilities in predictive modeling.

