Forecasting stock market trends stands as a cornerstone of Machine Learning’s application in finance. In this piece, I’ll guide you through a straightforward Data Science endeavor: Stock Price Prediction employing Machine Learning with Python.

By the conclusion of this article, you’ll grasp the methodology behind predicting stock prices. We’ll delve into the implementation of a Linear Regression model using Python, equipping you with actionable insights for your own predictive analyses.”


Import Necessary Library
Read Data
Data Transformation
Handling missing Value
Standardizing data
Data spliting
Build model
Polynomial Regression Model

1- Import Necessary Library

import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

2- Read Data

!pip install wget
!rm markets.csv
!wget -O stock_data.csv
df = pd.read_csv("stock_data.csv", sep=",", header=0)

3- Preprocessing

Data preprocessing in machine learning refers to the steps and techniques applied to prepare raw data for model training. It’s a crucial phase in the machine learning pipeline as the quality of input data significantly impacts the performance and reliability of the model. Here are some common techniques used in data preprocessing:

cols = [1,2,3,4,5,7]
df = df[df.columns[cols]]
cols = [3]
Y = df[df.columns[cols]]
Index(['Stock'], dtype='object')
X['Stock'] = X['Stock'].fillna(X['Stock'].mode()[0])

4- Data Transformation

Convert categorical features to continuous features with Label Encoding

# Convert categorical features to continuous features with Label Encoding
from sklearn.preprocessing import LabelEncoder
lencoders = {}
for col in X.select_dtypes(include=['object']).columns:
lencoders[col] = LabelEncoder()
X[col] = lencoders[col].fit_transform(X[col])

5- Handling Missing Values

This code performs multiple imputation by chained equations (MICE) to handle missing values in a dataset.

import warnings# This imports the Python module for handling warnings.
warnings.filterwarnings("ignore")# This line ignores any warnings that might be raised during the execution of the code.
# Multiple Imputation by Chained Equations
# This line enables the experimental version of the iterative imputer in the scikit-learn library.
# Iterative imputation is a technique for imputing missing values by modeling each feature with missing values as a function of other features.
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
MiceImputed = X.copy(deep=True)
mice_imputer = IterativeImputer()
MiceImputed.iloc[:, :] = mice_imputer.fit_transform(X)

5-Standardizing data

Standardizing data in machine learning refers to the process of transforming numerical features to have a mean of 0 and a standard deviation of 1.

from sklearn import preprocessing
r_scaler = preprocessing.MinMaxScaler()
modified_data = pd.DataFrame(r_scaler.transform(MiceImputed), index=MiceImputed.index, columns=MiceImputed.columns)

6-Data Splitting

Data splitting in machine learning refers to the process of dividing a dataset into multiple subsets for different purposes, typically for training, validation, and testing.

X_train, X_test, y_train, y_test = train_test_split(modified_data, Y, test_size=0.30, random_state=100)

7- Build Linear Regression Model

Building a model in machine learning involves selecting an appropriate algorithm, training it on labeled data, fine-tuning its parameters, and evaluating its performance.

learner = LinearRegression() #initializing linear regression model,y_train) #training the linear regression model
score=learner.score(X_test,y_test)#testing the linear regression model
score= 0.9999142675478361

8- Build Polynomial Regression Model

In polynomial model we will import the polynomial model from sklearn and fit our dataset just like we have done in the linear model and then fit our polynomial model into the linear model(linear_model2)

# Import necessary libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
# Create polynomial features
poly_features = PolynomialFeatures(degree=2) # We choose a polynomial of degree 2
X_train_poly = poly_features.fit_transform(X_train)
# Fit the polynomial regression model
poly_reg = LinearRegression(), y_train)
# Evaluate the model on the testing set
X_test_poly = poly_features.transform(X_test)
y_pred = poly_reg.predict(X_test_poly)
# Calculate the R^2 score
score = r2_score(y_test, y_pred)
print("R^2 Score:", score)

