Accelerated NumPy guide for analyzing stock trends

I have been working on developing some fundamentals in python programming as of late and from the perspective of analytics, NumPy happens to be a useful package. NumPy basically is a package that can be deployed with Python for performing enhanced scientific computing. (Heavily used in trend visualization and forecasting analytics). IDE I am using here is PyCharm. Having tried my hands on PyDev and Spyder, I have finally settled down with PyCharm. It felt more comfortable to me.

To begin with, I would be going through different sections of code, fragment by fragment in order to make the whole thing more lucid. I, myself, had to follow several tutorials in order to get some concrete understanding.

Fragment 1: Since we are to analyze live trends, first task is we would be importing data from an external source — the internet. You’ll have to pull live data so I went with finances.yahoo.com to get data for some XYZ company.
`import urllibimport matplotlib.pyplot as pltimport numpy as npimport matplotlib.dates as mdatesimport datetime as dt`

Initially, we just import some command style functions such as matplotlib.pyplot to begin with. This function basically a collection of various functions that enables you to perform plot-based analysis. You are essentially calling/invoking all the plot-related functions integrated within. Matplotlib.dates as mdates is just used for converting unix based date stamps into date stamps which matlplotlib can decode.

Secondary modules include urllib which essentially provide a high-level interface for fetching data from across the internet. As stated earlier, NumPy essentially is pitched as being a fundamental package for scientific computing within python. import numpy as np invokes the function. Numpy is highly useful in high-level mathematical computations. If you’re inclined towards introducing yourself to python via. some form of mathematical routing, SciPy stack is the way to go. It’s a collection of various core packages such as NumPy, SciPy library, Matplotlib, pandas, etc.

Fragment 2: We begin with declaring a function with company as the argument. This argument will just represent the name of the company for which we are running the trend analysis — ‘TESLA’ in our case.
`def graph(company):    fig =  plt.figure()    ax1 =  plt.subplot2grid((1,1), (0,0))    stock_price_url = 'https://chartapi.finance.yahoo.com/instrument/1.0/tsla/chartdata;type=quote;range=10d/csv'    source_code = urllib.request.urlopen(stock_price_url).read().decode()    stock_data = []    split_source = source_code.split('\n')`

Fig = plt.figure() essentially just invokes the figuring functionality via. the matplotlib.figure function provided under matplotlib.plt(which we have imported).

Stock_price_url links to the actual data for TESLA. This data set contains about 5 years worth of data for stocks with attributes — date, closing price, highest price, lowest price and opening price.

I had to dig some stuff up from some tutorials to invoke the above line but the bottom line is it’s not very complicated. This is basically a urllib code which is sort of bridging your matplotlib with the web data. This enables matplotlib to access that url. Remember this — once you access data, you HAVE to read it and then decode it. That’s how it works out. So that’s that.

stock_data =[] is basically an empty set which we would be populating so I defined it early. As a precautionary measure, we’d also be splitting up the data by new line hence the split_source command.
Fragment 3: Having a look at the dataset for Tesla, one can see that some initial few lines are meaningless and must be filtered out. So here’s the deal, data which is of use to us begins from the line after the volume statement. Basically, we need 6 values — 1 each corresponding to date, closing, opening, highest, lowest, volume labels separated by a comma delimiter. Anything/any line less or more than 6 in length is to be filtered. That’s the running logic as of now.

The code below checks for the above rule we framed in each split line. For split lines with length equal to 6, they are appended in the empty stock_Data list created earlier. Since each value within a line is separated by a comma, we use:

split_line = line.split(‘,’)
`for line in split_source:    split_line = line.split(',')    # if length is exactly 6 we take it else if less or more, we filter it.    if len(split_line) == 6:        if 'values' not in line and 'labels' not in line:            stock_data.append(line)`
Fragment 4: At this point, we have the entire data in a parsed form since the 6 values of interest per line are now taken from all the gibberish that was there and all these lines comprising the stock data are now stored in the stock_data variable.

What’s going down in this segment is something referred to as unpacking. The 6 variables are unpacked into 6 different variables. NumPy comes into play here. As you can see 6 different variable names have been assigned.

np.loadtxt basically loads data from flat file sources(the kind of file we have in question, right? It’s a .csv).

`date, closep, highp, lowp, openp, volume = np.loadtxt(stock_data,                                                     delimiter = ',',                                                     unpack=True,                          converters={0: bytespdate2num('%Y%m%d')})`
`dateconv = np.vectorize(dt.datetime.fromtimestamp)date = dateconv(date)`

Argument 1 — Stock_Data — This is where the data would be loaded hence the argument.

Argument 2 — delimeter = ‘,’ since each variable is separated by a comma.

Argument 3 — Unpack = TRUE since we are now unpacking the variables.

Argument 4 — Converters parameter used to specify the date element which is supposed to be converted. It has the 0th index in the list hence convertors(0:)

convertors(0: bytespdate2num(%y%m%d)) represents a bytespdate2num function which will be created later on. This function basically converts the unix-formatted date into a format that can be interpreted by matlab.
Fragment 5: In this fragment, we would be writing a function which would assist in converting the time stamps into a matlab-friendly format.

These are the libraries that need to be imported.

`import datetime as dtimport time`

To check the time stamps, run the following command:

`time.time()`
`>>> 1494779837.3990254 `

This is the unix-generated time stamp. This can be stored in a separated variable:

`example = time.time()print(example)print(dt.datetime.fromtimestamp(example))`
`>>> 2017-05-12 18:40:11.04644`

As can be seen from the output, this gives us a legit time stamp in the required format. This date needs to be converted to NumPy time.

Thus, we introduce the bytespdate2num function: This function basically decodes the encoded data; breaks it and strips the date to the number format.

`def bytespdate2num(fmt, encoding = 'utf-8'):    strconvertor = mdates.strpdate2num(fmt)    def bytesconverter(b):        s = b.decode(encoding)        return strconvertor(s)    return bytesconverter        `

Finally, we have ourselves a well-written code chunk:

`import matplotlib.pyplot as pltimport numpy as npimport urllibimport matplotlib.dates as mdatesdef bytespdate2num(fmt, encoding='utf-8'):    strconverter = mdates.strpdate2num(fmt)    def bytesconverter(b):        s = b.decode(encoding)        return strconverter(s)    return bytesconverter    def graph_data(stock):    stock_price_url = 'http://chartapi.finance.yahoo.com/instrument/1.0/'+stock+'/chartdata;type=quote;range=10y/csv'    source_code = urllib.request.urlopen(stock_price_url).read().decode()    stock_data = []    split_source = source_code.split('\n')    for line in split_source:        split_line = line.split(',')        if len(split_line) == 6:            if 'values' not in line and 'labels' not in line:                stock_data.append(line)    date, closep, highp, lowp, openp, volume=np.loadtxt(stock_data,                                                    delimiter=',',                                                     unpack=True,                        converters={0: bytespdate2num('%Y%m%d')})    plt.plot_date(date, closep,'-', label='Price')     plt.xlabel('Date')    plt.ylabel('Price')    plt.title('Interesting Graph\nCheck it out')    plt.legend()    plt.show()graph_data('TSLA')`

The final trends for Tesla would resemble the plot below. The green axes are something related to subplots I was just practicing so that’s not something to worry about. I have previously analysed stock trends over on matlab but that was way back and now having done the same stuff on python — although this is just a really fundamental and basic guide-I can now honestly say that Python is much more convenient for me as a user since different sub-modules can be laid down separately and then can be integrated easily. For data mining tasks, I would go with R. Probably so because I haven’t done DM in Python so that sort of makes me biased :p . R offers some diverse features as well but pertaining to visualization, I intend to work more on both python and R to narrow down the differences/similarities between these two platforms/languages which happen to be the best in the game!

Reference: Programming guide to data visualization using NumPy
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.