Building a Stock Analysis Tool on Pi: Part 1 — Data Scraping and Storage

6 min readFeb 23, 2024

Over the past couple of years, I’ve been dipping my toes into the world of stock investing, buying on dips and focusing on large cap stocks. I had a bit of a learning experience with HDFC bank when I bought after a quick 10% dip from an all-time high. Despite expert advice warning against “catching a falling knife”, I followed my gut and bought more, only to learn the hard way that this wasn’t my best decision. It still hasn’t hit rock bottom and I’m still losing money. That’s when I realized just how important technical analysis is before making any decisions.

I’ve noted down all the popular technical indicators from Zerodha Varsity. I figured all the trading data I need is out there somewhere on the internet, so why not backtest these indicators and develop strategies based on, say, a decade’s worth of market data? I need data, tons of it, so I did a quick search and found that TradingView has it all, but it’s a bit too manual and pricey for my liking. I need something that’s flexible, free, and explainable.

I then discovered that while NSE does provide historical intraday data, it comes with a cost. So, the search continues!

So, I realized it offers daily intraday data. And hey, if I can scrape the website and store the stuff into an SQL db, that’s a good starting point, right?

I’m not about to shell out cash for EC2 or app services. I wanted everything barebones and budget-friendly. So I grabbed my raspberry pi from 2017, dusted it off, and put a fresh version of Raspbian with ssh enabled on it.

Alright, let’s kick things off by making a folder for this project.

mkdir stock_analysis

Next, we’re setting up a python environment.

cd stock_analysis

python -m venv stock-env

To get it going,

source stock-env/bin/activate

Now, inside our ‘stock_analysis’, let’s make a new ‘stock’ folder. This is where we’ll write the main file and store all the data we scrape.

mkdir stock

cd stock

Time to make a new file, ‘nse_scrap.py’.

First off, we’re gonna need to import some libraries:

import requests
import json
import time
import datetime
import sqlite3

Here’s the deal. We’re gonna pull some data for stock IDs and derivatives IDs. They’re just a bunch of stocks and market indices from the NSE.

I totally found these IDs by doing some detective work on the NSE webpage.

ids = ['TATAMOTORSEQN','COALINDIAEQN'...]
derivatives_ids = ['NIFTY%20BANK','NIFTY%2050','NIFTY%20AUTO','NIFTY%20ENERGY' ...]

ids+=derivatives_ids

I also snagged some headers from the request.

headers = {
'authority': '[www.nseindia.com](<http://www.nseindia.com/>)',
'accept': 'application/json, text/javascript, */*; q=0.01',
...
}

Next, we’re gonna take a little stroll through our list of IDs and ask the NSE API nicely for some data.

for id in ids:
    params = {
        'index': id,
        'indices': 'true'
    }
    response = requests.get(r'<https://www.nseindia.com/api/chart-databyindex>', params=params, headers=headers, cookies=cookies)
    if response.status_code == 200:
        parsed_data = validateData(id, response.text)
        if parsed_data:
            storeData(parsed_data)
    else:
        print("Request was not successful. Status code:", response.status_code)

And finally, let’s whip up some functions to check out the JSON data from the API and tuck it away in an SQLite3 database.

The validateData function takes two parameters, id and json_data, and is designed to validate the format of JSON data received from the NSE API

def validateData(id,json_data):
    try:
        # Parse JSON data
        parsed_data = json.loads(json_data)

# Validate the data format
        if 'identifier' in parsed_data and 'name' in parsed_data and 'grapthData' in parsed_data:
            if isinstance(parsed_data['identifier'], str) and isinstance(parsed_data['name'], str) and isinstance(parsed_data['grapthData'], list):
                if len(parsed_data['grapthData']) >= 1:
                    print("JSON data format is valid.")
                    return parsed_data
                else:
                    print(f"JSON data format is invalid: 'graphData' should have at least one entry for {id}.")
            else:
                print(f"JSON data format is invalid: Incorrect data types for {id}.")
        else:
            print(f"JSON data format is invalid: Missing required keys for {id}.")
    except Exception as e:
        print("Error occurred while parsing or validating JSON data:", e)
    return None

The storeData function is responsible for storing the validated data received from the NSE API into an SQLite3 database.

def storeData(parsed_data):
    if parsed_data:
        try:
            # Dump data into database
            conn = sqlite3.connect('/home/aspardo/Projects/stock/nse.db')
            cursor = conn.cursor()

# Create table if not exists
            cursor.execute('''CREATE TABLE IF NOT EXISTS nseData
                  (id INTEGER PRIMARY KEY,
                  timestamp INTEGER,
                  date TEXT,
                  identifier TEXT,
                  name TEXT,
                  grapthData TEXT)''')
            # Get the current timestamp
            timestamp = int(time.time())
            today_date = datetime.date.today().strftime('%Y-%m-%d')
            # Insert data into table
            cursor.execute("INSERT INTO nseData (timestamp, date, identifier, name, grapthData) VALUES (?, ?, ?, ?, ?)",
               (timestamp, today_date, parsed_data['identifier'], parsed_data['name'], json.dumps(parsed_data['grapthData'])))
            # Commit changes and close connection
            conn.commit()
            conn.close()
            print("Data has been dumped into the database.")
        except Exception as e:
            print("Error occurred while storing data in the database:", e)

So, what we gotta do next is set up a cron job. This will make our script (the one that scrapes NSE) run on autopilot.

To tinker with cron jobs, we’ll need to peek into the cron table. Pop open a terminal window, and throw in this command:

crontab -e

Let’s open it up with nano and set this to run every day at 9 PM. (Just a heads up, the market closes at 3:30 PM)

0 21 * * * /home/username/stock_analysis/stock-env/bin/python3.9 /home/username/stock_analysis/stock/nse_scrap.py

Here’s a quick introduction to cron syntax:

Cron syntax consists of five fields:

Minute (0–59)
Hour (0–23)
Day of the month (1–31)
Month (1–12)
Day of the week (0–6, where 0 represents Sunday)

Each field is separated by a space, and you can use an asterisk (*) to represent any value within that field.

The path to the python environment we have created is /home/username/stock_analysis/stock-env/bin/python3.9, and /home/username/stock_analysis/stock/nse_scrap.py is the path to the Python script we want to execute.

After adding your cron job, save and exit the crontab file. In nano, you can do this by pressing Ctrl + O to write and Ctrl + X to exit.

Let’s wait until 9 and see if it works.

In stock folder, create another python file display.py to display the chart of one stock.

import sqlite3
import matplotlib.pyplot as plt
import numpy as np

# Connect to the SQLite database
conn = sqlite3.connect('nse.db')
cursor = conn.cursor()
# Execute a SELECT query to retrieve data from the table
cursor.execute("SELECT * FROM nseData")
# Fetch all rows from the result set
rows = cursor.fetchall()
# Display the retrieved data
for row in rows:
    first_one_data = row[5]
    first_one_id = row[4]
    break
# Close the database connection
conn.close()


data_list = eval(first_one_data)  # Use eval to convert the string into a list of lists
# Extracting time and price data
time = [datum[0] for datum in data_list]
prices = [datum[1] for datum in data_list]
# Convert time from milliseconds to seconds for plotting
time_seconds = np.array(time) / 1000
# Plotting the time series data
plt.figure(figsize=(10, 6))
plt.plot(time_seconds, prices, marker='o', color='blue', linestyle='-')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.title(f'{first_one_id} Stock Price Time Series data')
plt.grid(True)
# Save the plot as an image without showing it
plt.savefig('stock_price_time_series.png')
# Close the plot to avoid showing it
plt.close()

In conclusion, we have successfully set up a system to scrape daily stock data from NSE and store it in a database for analysis. This is the first step towards building our stock analysis tool. . Stay tuned for Part 2 of our series. In the upcoming installment, we will delve deeper into technical analysis and back test some of the popular strategies.

Building a Stock Analysis Tool on Pi: Part 1 — Data Scraping and Storage

Written by Jithin Sankar