My first ML/DL app on Streamlit -Stock prediction app

Hugo Shih
DataRoad
Published in
5 min readSep 16, 2020

Transporter: https://twstock-tool.herokuapp.com/

This year (2020), I gave myself a chance to learn a new skill: Data Analytics and ML / DL at Le Wagon Data Science Bootcamp in Shanghai, remotely.

It was a very special journey for me as I’m from the entertainment industry (Motion Picture and Television: colorist) which rarely has people enroll for this. Also, the Data Science Bootcamp required candidates to have an exam to test if the student is fully qualified with knowledge of maths and coding skills to attend the Bootcamp. Until now, I can’t believe that I passed the exam and survived from this intense program. Moreover, teamed up with other students to create a small web app.

I hope I can share more information about my studying experience at the Bootcamp and what information I have been collected in different articles. Today, I just want to share my very first web app which we created at the Bootcamp — “ Bull: Stock Price Prediction ”.

This is not the original version of the app as I want to focus on Taiwan’s stock market instead of the US market. Thus, I made some changes and added/removed some functions to achieve my goal. Since I’m at the very beginning level, this article is more like my learning experience journal instead of a tutorial. I hope I can see my improvement from these journals in the future. 💪🏻

Idea & Data Retrieval

I don’t have previous experience of creating an app, so it was a little bit struggling for me at the very beginning. Like I mentioned before, I want to create an app focus on Taiwan’s stock market. However, I realized there is no API for Taiwan’s stock natively. Meaning, the only way I can get full information and equity list is from web scrapping.

I found a website called FinLab that teaching people using python programming to create personal tools include web scraping from TWSE (Taiwan Stock Exchange Corporation). Here is the code I used for scraping OHLC (Open / High / Low / Close) based on all sectors.

After I finished the code, I realized one thing: Am I going to make a full English app, full Chinese app, or both?

Start small and dream big.
-Robert T. Kiyosaki

When I realized how complicated it would be, I decided to start from small by just making the English version first.

Yahoo Finance is a popular website for financial news, and there is a module called Yfinance which allows me to scrape the information from there. This module can also combine with the Pandas’ Data reader. The challenge on this part was that not all of Taiwan's equity is available on Yahoo Finance. Thus, I scraped all equities from TWSE’s 24 sectors and then used it to query on Yahoo Finance to filter out those unavailable data.
* Unavailable data include those delisted equities or incomplete data.

Once I had the equity list, I created some functions to retrieve the data.
(@st.cache optimize the performance of data retrieval on Streamlit.)

Data Preparation

Since I got the data from Yahoo Finance, it doesn’t include the holiday information which might cause the NaN value. I didn’t do much data cleaning. However, I did some data wrangling to make the DataFrame as I needed.

Feature Selection and Scaling

For feature selection, our team decided to make it simple since we only had 2 weeks for building an app. The features we had were Open/High/Low/Close, SMA, EMA, and RSI, etc. The updated version I have is only OHLC data.

The training and testing dataset is a 7:3 ratio and used Min Max Scaler to make the feature range between 0 and 1.

Modeling

Since stock price prediction is time-series, we used LSTMs to predict the next day's price based on the historical data. The model hasn’t been tuned yet due to the time concerned. I will keep working on this and have another article to explain the details.

Evaluation

For the LSTM model, there are four evaluation metrics. I’m planning to explain all the metrics for a different type of model. Stay tuned.

  • R square
  • MAPE (Mean Absolute Percentage Error)
  • RMSE (Root Mean Square Error)
  • MAE (Mean Absolute Error)

Deployment

There is an article that explained how I deployed the Streamlit on Heroku. Please check here. 👈

There are two things I really like about Heroku. First, it’s free. I can deploy up to 5 projects to the platform. Second, it has “Automatic Deploys”. Once I enable this function, Heroku app will be automatically updated every time when I push to the Github original master.

There was an issue I didn’t mention from the previous article: Slug size.

After the LSTM model was added, I updated the requirement file to have Tensorflow. However, I kept getting “Compiled slug size is too large” message (Max is 500MB). Turns out, the latest Tensorflow has a big GPU-related module that causes this issue. Therefore, the temporary solution for me has only installed the Tensorflow CPU module.

Here is more information about the Slug size here.

This article only scratches the surface of the app and the concept. It still has many things that can be done better. If you have any suggestions, please leave the comment. Thank you for taking the time to read this article.

--

--

Hugo Shih
DataRoad

Tech enthusiast, data analyst, film colorist. Love challenges and new adventures.