Everything starts with the data acquisition

As training neural networks requires a lot of data, the first task is to get massive amounts of training data and structure them in a usable way.

The large amount of training data has to be downloaded just once and can be reused. We will save the trainings data in separate .json files, one for every stock for a time of ~5 years. Frequent refreshing of the training data should not be to important. Just the evaluation data (with which we actually use the neural network in the future) need up to date data to make proper predictions for the future.

First need to specify which stocks we want to download — in the future we do not want to evaluate the tendency of single stocks but want to analyse the whole stock market. First attempt is to just read the nasdaq index and extract the tags:

import csv
index = ‘nasdaq.csv’
def read_tags():
with open(index, mode=’r’) as infile:
reader = csv.reader(infile)
return [(row[0], row[1], row[4]) for row in reader]
tags = read_tags()

this leaves us with an array of 3204 different stock what should be enough for now.

Next step is to retrieve the data itself. This can be pretty complex in terms of goodness of data (upload time, latency, stability of the server, frequent changes in the API), so maybe we have to change something here in the future — for now we are good with the pandas framework downloading from Google. Because we are in prototyping we just evaluate 4 different stocks from our tag library. The start and the end date are hard coded at the moment but will later make it possible to crawl timewise through the data.

start = date(2016, 2, 1) # start date
end = date(2016, 10, 3) # end date

and there we go:

import pandas_datareader.data as web
stocks = web.DataReader([tag[0] for tag in tags[1:6]], “google”, start, end)
Show your support

Clapping shows how much you appreciated themaennis’s story.