DIY Stock Tracker using Python and Google BigQuery.

Praneeth Kandula
Jan 31 · 4 min read
Photo by M. B. M. on Unsplash

In this post we will write a python script that fetches stock market data using the yfinance package, processes the data and uploads the data into a Google BigQuery table which can be used for further analysis or visualization.

Fetch Stock Market Data

First we need to get the tickers for stocks in the S&P 500 and download them into a Pandas data frame. I have used the data made available by our friends at Open Knowledge Foundation here. It might take a while if you are to download the data for all 500 companies, try to run the script on a limited number of stocks to begin with.

#Import Packages
import pandas as pd
import yfinance as yf
#Fetch Tickers Data
get_tickers = pd.read_csv("https://raw.githubusercontent.com/datasets/s-and-p-500-companies-financials/master/data/constituents.csv")
get_tickers.head()#Fetch Stock Data
stock_dat = yf.download(get_tickers.Symbol.head(5) tolist(),period="2d",actions=False,group_by=None)
stock_dat.head()
Wide Format Data

If you look at the structure of stock_dat , the data is in wide format and data for each ticker can be accessed by passing it as a key stock_dat[“ACN”] . This is not ideal if we want to store this data in a database and use it for further analysis. So using melt and pivot_table let’s process the data to make it long such that each ticker is a row.

#Process Stock Data
long_form = stock_dat.reset_index().melt('Date', var_name=['Ticker', 'var'])
long_form = long_form.pivot_table(index=['Date', 'Ticker'], columns='var', values='value').reset_index()
Long Format Data

Next let’s calculate the change in stock price from previous days’ close. We create two new columns, Change and Pct_Change, remove rows from the previous day and finally change datatypes of columns.

long_form["Change"] = long_form.groupby("Ticker").Close.diff()
long_form["Pct_Change"] = long_form.groupby("Ticker").Close.pct_change() * 100
long_form = long_form[long_form ["Date"] == long_form["Date"].iloc[long_form.shape[0]-1]]
long_form["Date"] = long_form['Date'].dt.strftime('%Y-%m-%d')
Clean structured data.

Upload Data to BigQuery Table

Note: You might be billed for using any services offered by Google, although the costs for working with BigQuery on small datasets is minimal, Please see pricing info before continuing.

In order to be able to use BigQuery services we need to authenticate to the BigQuery API. To authenticate without user interaction, we need to create a service account on GCP and use a service token to authenticate. If you don’t already have a service token, see this post which will walk you through how to obtain one.

#Import BigQuery and Google Authentication Packages
from google.cloud import bigquery
from google.oauth2 import service_account
#Get Credentials and authenticate
key_path = "your_service_token_file.json"
credentials = service_account.Credentials.from_service_account_file(
key_path,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
client = bigquery.Client(credentials=credentials,
project=credentials.project_id)

Before we upload the data, BigQuery does not allow space as a character in the column names, so let’s rename the Adj Close column to Adj_Close and then upload the data. You need to specify an existing destination dataset for the data, create a new dataset in BigQuery if you haven’t already. The table with the specified name will be created in the existing dataset and the data will be written into this table.

long_form = long_form.rename(columns={"Adj Close":"Adj_Close"})
table = 'your_dataset_name.your_table_name'
job = client.load_table_from_dataframe(long_form, table)

Now let’s query the same table that we just created and make sure the data was uploaded.

query_data = 'SELECT * FROM your_dataset_name.your_table_name'
client.query(query_data).result().to_dataframe()

You can even connect to this BigQuery table using Google Data Studio and visualize the data. You can grab the entire python script from my github.

Let me know below if you have any questions. Instead of running this script everyday, if you want to automate the process and setup a recurring job using Google Cloud functions and Pub/Sub, watch out for my next post.

References:
[1] https://pypi.org/project/yfinance/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…