Sklearn expects Data to be In Shape

Sklearn expects a 2D array

Mahindra Venkat Lukka
Geek Culture
2 min readNov 10, 2021

--

Photo by fabio on Unsplash

Problem Statement: Below sklearn error says it all.

ValueError: Expected 2D array, got 1D array instead: array=[2 7 8 4 1 6]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Let's realize the importance of getting the data into shape while building a basic sklearn linear regression model.

Import necessary libraries,

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

Create a sample data frame,

data = {'X':[1,2,3,4,5,6,7,8,9,10],
'y':[5,7,9,11,13,15,17,19,21,23]}
df = pd.DataFrame(data)

Get X, y data from the dataframe df,

X = df["X"]
y = df["y"]

Perform train test split,

X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, y, test_size = 1/3, random_state = 0)

Build a basic linear regression model,

# instantiate the model with desired parameter values
lr = LinearRegression()
# fit the model to the training data
lr.fit(X_Train, Y_Train)
# apply the model to test data
y_pred = lr.predict(X_Test)
# get coef, intercept values
print (lr.coef_)
print (lr.intercept_)

Here comes an Error!

ValueError: Expected 2D array, got 1D array instead: array=[2 7 8 4 1 6]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Before experiencing this error, I never knew that we need 2D array to get sklearn work. Let's debug the error. The error says it expected 2D array but got 1D array instead. The shape of X gave an output of (10,) which is clearly a 1D array.

X.shapeOutput: (10,)

We used X = df[“X”] to get the X variable. This is generating a series with a shape of (10,) without any columns. There were many ways to get a 2D array assigned to X. Let's see a few of them

#Option 1 - Reshaping to (-1,1)
X = np.array(X).reshape(-1,1)
#Option 2 - Getting as a dataframe
X = df.iloc[:, :-1].values
#Option 3 - Getting as a dataframe
X = df[["X"]]

Once we modify X, the remaining code runs well and gives us the output.

Food for thought: Whenever using sklearn, make sure to assign a 2D array to the predictor variable before using it in the model.

--

--

Mahindra Venkat Lukka
Geek Culture

Search Capacity Planning at Amazon || MS in Business Analytics from W. P. Carey School of Business, Arizona State University || My opinions are my own