Sklearn expects Data to be In Shape
Sklearn expects a 2D array
Problem Statement: Below sklearn error says it all.
ValueError: Expected 2D array, got 1D array instead: array=[2 7 8 4 1 6]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Let's realize the importance of getting the data into shape while building a basic sklearn linear regression model.
Import necessary libraries,
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
Create a sample data frame,
data = {'X':[1,2,3,4,5,6,7,8,9,10],
'y':[5,7,9,11,13,15,17,19,21,23]}
df = pd.DataFrame(data)
Get X, y data from the dataframe df,
X = df["X"]
y = df["y"]
Perform train test split,
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, y, test_size = 1/3, random_state = 0)
Build a basic linear regression model,
# instantiate the model with desired parameter values
lr = LinearRegression()# fit the model to the training data
lr.fit(X_Train, Y_Train)# apply the model to test data
y_pred = lr.predict(X_Test) # get coef, intercept values
print (lr.coef_)
print (lr.intercept_)
Here comes an Error!
ValueError: Expected 2D array, got 1D array instead: array=[2 7 8 4 1 6]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Before experiencing this error, I never knew that we need 2D array to get sklearn work. Let's debug the error. The error says it expected 2D array but got 1D array instead. The shape of X gave an output of (10,) which is clearly a 1D array.
X.shapeOutput: (10,)
We used X = df[“X”] to get the X variable. This is generating a series with a shape of (10,) without any columns. There were many ways to get a 2D array assigned to X. Let's see a few of them
#Option 1 - Reshaping to (-1,1)
X = np.array(X).reshape(-1,1)#Option 2 - Getting as a dataframe
X = df.iloc[:, :-1].values#Option 3 - Getting as a dataframe
X = df[["X"]]
Once we modify X, the remaining code runs well and gives us the output.
Food for thought: Whenever using sklearn, make sure to assign a 2D array to the predictor variable before using it in the model.