Apply Linear Regression Model using Python

kamal chanchal
8 min readApr 29, 2024

--

In the world of finance, analyzing data is really important for making smart choices. Machine learning tools help us understand data better and even predict what might happen in the future. This article is all about using a method called Linear Regression to study stock market data using Python. We’ll use a special library called sklearn to do this. The code we’ll talk about is organized into a class-based structure, which makes it easier to work with and understand.

Linear Regression Model

Linear regression is a fundamental machine learning model that plays a crucial role in finance, particularly in the stock market, due to its ability to analyze relationships between variables and make predictions. Linear regression helps in predicting future stock prices based on historical data. Linear regression is widely used in financial analysis to analyze the impact of various factors on stock prices and other financial metrics.

Linear regression models can be integrated into trading strategies to identify buy and sell signals based on statistical patterns and trends in stock prices. By analyzing historical price data and generating predictive models, traders can develop systematic trading strategies

Output [Sample]

Setting Up the Environment:

First, we need to talk about the tools we’ll be using. We’ll need three main libraries: numpy, pandas, and sklearn. Each of these helps us do different things with our data when we’re looking at stocks. To use them, you’ll need to install them first.

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

Creating the Class:

Let’s talk about why we put our code into something called a class. It’s like putting all our tools in one toolbox. This makes our code easier to organize and use again later. We call this “class-oriented design.” One big benefit is that it makes our code easier to understand and change later on. Now, we’ll introduce you to our class called QuantML. It’s like our toolbox for handling stock data. It helps us organize and work with the data in a smarter way.

#sample OHLC data
class QuantML:

def __init__(self):
print("****** My First ML model(On Stocks Data) *******")
self.__prepare_data()

Getting the Data Ready:

Let’s start by getting our data ready. We’ll create some sample data that shows the OPEN, HIGH, LOW, and CLOSE prices of stocks. This data is really important in understanding how stocks perform over time. The __prepare_data method helps us do this by putting our data into a table format called a DataFrame. This makes it easier for us to work with and analyze the data later on.

Describe Historical Data
    def __prepare_data( self ):
try:

dates = pd.date_range ( start = '1/1/2024' , end = '4/10/2024' )
open_prices = np.random.randint ( 100 , 200 , size = len ( dates ) )
high_prices = open_prices + np.random.randint ( 5 , 20 , size = len ( dates ) )
low_prices = open_prices - np.random.randint ( 5 , 20 , size = len ( dates ) )
close_prices = np.random.randint ( 100 , 200 , size = len ( dates ) )

# Creating DataFrame
self.__df_hlc_data = pd.DataFrame ( {
'Date' : dates ,
'Open' : open_prices ,
'High' : high_prices ,
'Low' : low_prices ,
'Close' : close_prices
}
)
except Exception as e:
print("Error Ocurred in Preperation of Data : {e}")

Using Linear Regression:

Now, using Linear Regression to understand our stock data better. First, we’ll explain how to use a tool called sklearn.linear_model.LinearRegression.

This tool helps us create a model that finds patterns in our data and makes predictions based on those patterns. We’ll show you how to train the model, which means teaching it how to understand our data. Then, we’ll explain how the model can make predictions about future stock prices. Lastly, we’ll talk about how important it is to choose the right parts of our data to focus on (feature selection) and to get our data ready in a certain way before training our model (data preprocessing). Emphasize the importance of feature selection and data preprocessing in model training.

    def __calculateMean( self ):
try:

self.__df_hlc_data [ 'OHLC_mean' ] = self.__df_hlc_data [ [ 'Open' , 'High' , 'Low' , 'Close' ] ].mean ( axis = 1 )

# print(self.__df_hlc_data)

except Exception as e:
print(F"Error(s) Occured while performing Mean of OHLC price {e}")

def __start( self ):
try:
X = np.arange ( len(self.__df_hlc_data) ).reshape(-1,1)
y = self.__df_hlc_data["OHLC_mean"]
# print(X)
print("Applying Linear Regression Model -- Stocks Data")
model = LinearRegression()

model.fit(X,y)

y_predict = model.predict(X)

self.__ViewOutput(X,y,y_predict)


except Exception as e:
print(F" Exception Ocuured as {e} ")

Test-Train Split

An instance of the LinearRegression model from Scikit-Learn is created. The fit() method is called on the model with X and y as arguments, which trains the linear regression model on the provided data.

After the model is trained, the predict() method is used to generate predictions (y_predict) based on the input data X.

BreakDown of Variables

X = np.arange ( len(self.__df_hlc_data) ).reshape(-1,1)
y = self.__df_hlc_data["OHLC_mean"]
y_predict = model.predict(X)

Understanding the Results

Now, let’s see how we can look at the results of our analysis. We have a method called __ViewOutput that helps us visualize what we’ve found. We use a tool called matplotlib to draw graphs that show the average OHLC values and the straight line that our model predicts. Looking at these graphs can help us spot trends in the data .

To see the data points, we’ve utilized the matplotlib library. It helps us visualize both the mean and the output of the Linear Regression Model.

import matplotlib.pyplot as plt
    def __ViewOutput( self  , X,y,y_predict):
try:
plt.figure ( figsize = (10 , 6) )
plt.scatter ( X , y , color = 'blue' , label = 'Actual OHLC Mean' )
plt.plot ( X , y_predict , color = 'red' , linewidth = 2 , label = 'Linear Regression' )
plt.title ( 'Linear Regression on OHLC Data' )
plt.xlabel ( 'Days' )
plt.ylabel ( 'OHLC Mean' )
plt.legend ( )
plt.grid ( True )
plt.show ( )

except Exception as e:
print(F"Error Occured while display datapoint : {e}")

Save as LinearRegression.py

Here are the individual pieces of code or functions put together. To use them, save them in a file and then run them by importing the class and functions.

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


#sample OHLC data
class QuantML:

def __init__(self):
print("****** My First ML model(On Stocks Data) *******")
self.__prepare_data()

def __prepare_data( self ):
try:

dates = pd.date_range ( start = '1/1/2024' , end = '4/10/2024' )
open_prices = np.random.randint ( 100 , 200 , size = len ( dates ) )
high_prices = open_prices + np.random.randint ( 5 , 20 , size = len ( dates ) )
low_prices = open_prices - np.random.randint ( 5 , 20 , size = len ( dates ) )
close_prices = np.random.randint ( 100 , 200 , size = len ( dates ) )

# Creating DataFrame
self.__df_hlc_data = pd.DataFrame ( {
'Date' : dates ,
'Open' : open_prices ,
'High' : high_prices ,
'Low' : low_prices ,
'Close' : close_prices
}
)
except Exception as e:
print("Error Ocurred in Preperation of Data : {e}")


def Engine( self ):
# self.__NewColoumns()
self.__calculateMean()

self.__start()


def __calculateMean( self ):
try:

self.__df_hlc_data [ 'OHLC_mean' ] = self.__df_hlc_data [ [ 'Open' , 'High' , 'Low' , 'Close' ] ].mean ( axis = 1 )

# print(self.__df_hlc_data)

except Exception as e:
print(F"Error(s) Occured while performing Mean of OHLC price {e}")


def __start( self ):
try:
X = np.arange ( len(self.__df_hlc_data) ).reshape(-1,1)
y = self.__df_hlc_data["OHLC_mean"]
# print(X)
print("Applying Linear Regression Model into my Stocks Data")
model = LinearRegression()

model.fit(X,y)

y_predict = model.predict(X)

self.__ViewOutput(X,y,y_predict)

except Exception as e:
print(F" Exception Ocuured as {e} ")


def __getresult( self ):
try:
pass
except Exception as e:
print(F"Error Occured while preparing result {e}")

def __ViewOutput( self , X,y,y_predict):
try:
plt.figure ( figsize = (10 , 6) )
plt.scatter ( X , y , color = 'blue' , label = 'Actual OHLC Mean' )
plt.plot ( X , y_predict , color = 'red' , linewidth = 2 , label = 'Linear Regression' )
plt.title ( 'Linear Regression on OHLC Data' )
plt.xlabel ( 'Days' )
plt.ylabel ( 'OHLC Mean' )
plt.legend ( )
plt.grid ( True )
plt.show ( )

except Exception as e:
print(F"Error Occured while display datapoint : {e}")

Running the Code

import LinearRegression as P


def myproject():
model = P.QuantML ()
model.Engine()

# # Press the green button in the gutter to run the script.
if __name__ == '__main__':
myproject()

Thank you for taking the time to read this post. If you found it informative or interesting, please consider clapping to show your appreciation!

Output :

Linear Regression [Output of above code]

Conclusion: By following the steps outlined in this article, readers can gain valuable insights into applying Linear Regression to analyze stock market data. The encapsulated class-oriented design ensures code readability and maintainability, making it suitable for both learning and practical applications in financial analytics.

This project showed how we can use linear regression models to predict trends in time-series data. However, it’s important to remember that stock prices can be influenced by various factors such as news, company fundamentals, earnings, and current events. Still, this model has its own importance in understanding and analyzing stock market trends.

Overall, Linear Regression models play a crucial role in finance, particularly in stock markets, by providing valuable insights into price trends, risk management, portfolio optimization, factor analysis, trading strategies, and market efficiency. However, it’s essential to combine machine learning techniques with fundamental analysis and market expertise to make well-informed investment decisions.

Ref : What Is Regression? Definition, Calculation, and Example (investopedia.com)

View : Running python Script QuantML
Quant Trading

📱LinkedIn: https://www.linkedin.com/in/kamalchanchal

📱Gmail : Kchanchal78@gmail.Com

📌You can also read my other Post Like:BackTesting Strategy Setup: Building a Python Trading Strategy Analyzer

📌View Indicators Value in Trading System with C# and WinForms

📌Black-Scholes in C# Options Pricing Model

📌Scan Stocks before Market Open using Pre- Opening with python

📌Algorithmic Trading : USE Tick Data to OHLC Candlesticks with Python

📌Algorithmic Finance : View Historical Index (Nifty 50) Wick to Wick

📊Explore the full potential of this project by visiting our GitHub repository.

Subscribe for more updates on Algorithmic Trading, financial analysis, and coding adventures using C# and Python. Thanks for reading!

Let’s stay connected and continue the conversation.

--

--

kamal chanchal

C# | Python | Capital Market | Artificial Intelligence | Data Science Engineering