# Multiple Variables Linear Regression in Python: Explained with coding outline

Linear regression is a popular statistical model used to understand the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data. In this article, we will walk through the steps to implement linear regression with multiple variables in Python.

Before we begin, make sure you have the following packages installed:

• NumPy: a library for numerical computing with Python

You can install these packages using `pip install numpy pandas matplotlib`.

# Step 1: Import the necessary libraries and load the data

First, we will import the required libraries and load the data into a Pandas DataFrame.

`import numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Load the data into a Pandas DataFramedata = pd.read_csv('data.csv')`

# Step 2: Explore and visualize the data

Next, we will explore and visualize the data to get a better understanding of the relationships between the variables.

`# Print the first few rows of the dataprint(data.head())# Get the number of rows and columns in the dataprint(data.shape)# Get the statistical summary of the dataprint(data.describe())# Plot a scatter plot of the dataplt.scatter(data['x1'], data['y'], color='b')plt.xlabel('x1')plt.ylabel('y')plt.show()plt.scatter(data['x2'], data['y'], color='b')plt.xlabel('x2')plt.ylabel('y')plt.show()# Plot a histogram of the datadata.hist()plt.show()`

# Step 3: Preprocess the data

Before we can fit a linear regression model to the data, we need to preprocess it. This includes splitting the data into training and test sets and standardizing the variables.

`# Split the data into training and test setsfrom sklearn.model_selection import train_test_splitX = data.iloc[:, :-1]y = data.iloc[:, -1]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)# Standardize the variablesfrom sklearn.preprocessing import StandardScalerscaler = StandardScaler()X_train = scaler.fit_transform(X_train)X_test = scaler.transform(X_test)`

# Step 4: Fit a linear regression model

Now, we can fit a linear regression model to the training data using the `LinearRegression` class from scikit-learn.

`from sklearn.linear_model import LinearRegression# Create a linear regression modelmodel = LinearRegression()# Fit the model to the training datamodel.fit(X_train, y_train)# Print the coefficientsprint(model.coef_)# Print the interceptprint(model.intercept_)`

# Step 5: Evaluate the model

After fitting the model, we can evaluate its performance on the test set using various metrics.

`# Make predictions on the test sety_pred = model.predict(X_test)# Calculate the mean squared errorfrom sklearn.metrics import mean_squared_errormse = mean_squared_error(y_test, y_pred)print(f'Mean squared error: {mse:.2f}')# Calculate the root mean squared errorrmse = np.sqrt(mse)print(f'Root mean squared error: {rmse:.2f}')# Calculate the coefficient of determination (R^2 score)from sklearn.metrics import r2_scorer2 = r2_score(y_test, y_pred)print(f'R^2 score: {r2:.2f}')`

# Step 6: Visualize the results

Finally, we can visualize the results by plotting the predicted values against the actual values.

`# Plot the predicted values against the actual valuesplt.scatter(y_test, y_pred, color='b')plt.xlabel('Actual values')plt.ylabel('Predicted values')plt.show()`

sThat’s it! You have successfully implemented linear regression with multiple variables in Python. You can use this model to make predictions on new data or fine-tune the model by adjusting the parameters or trying different algorithms.

--

--

## More from Khanmazhar

Freelance Technical Writer and Data Analyst on Upwork. Reach out for consultations at khanmazhar9101@gmail.com

## Get the Medium app

Freelance Technical Writer and Data Analyst on Upwork. Reach out for consultations at khanmazhar9101@gmail.com