Embracing Manual Data Collection

Eric Kleppen
Sep 17, 2019 · 4 min read

Collecting Option Order Flow

One of the factors driving me towards data science has always been my love for collecting data. My idea of fun on a Friday night includes web scraping, wrangling and data engineering. Although a lot of data is floating around out on the internet, understanding how to collect data is important because you might not always be able to find the data you want in the form you need. For example, a few years ago I tried my hand at day trading. Although I had limited success in the endeavor, I found that I was more interested in analyzing the data than chasing profits. I got involved with a group of guys who watch options order flow from a curated trade-alert feed and thought it would be interesting to see if I could analyze the order flow and find a potential edge hidden within. The website that hosted the feed didn’t have very many options insofar as searching and analyzing, so I decided to start collecting it myself.

Challenge 1: Copying the Data

Since the website didn’t have a public API from which I could pull the data, I had to figure out a way to copy it. The website was designed in a way that made scraping it difficult, and the page did not allow native highlighting so I couldn’t simply copy and paste all the text. I started digging for a solution and came across the chrome extension Allow Copy. Allow Copy will re-enable select, copy, paste and right click in any web page using any copy protection. Once the extension was installed, I was able to select their data feed, copy it and paste it.

Chrome extension Allow Copy

Challenge 2: Historical Data

The feed only displayed so many results per day, but it did have a Load More button that allowed a user to load previous day’s results, 50 at a time. As the list grew, the website performance would tank. I could only go back 30 days before the website crashed. Not only did it make the website unstable, but it was also a lot of clicking and dragging to get the maximum number of days to load into the list. I decided to automate the task using an Auto-Hotkey macro creator called Pulover’s Macro Creator. I simply recorded myself clicking the Load More button, and then I looped the playback until I loaded as many days as I could.

The Load more button

Challenge 3: Cleaning the Data

I automated loading the data and was able to select it all and copy it, but the data needed to be cleaned to make analyzing it simpler. I needed to add a date and adjust the size column. For example, the data included the amount of money spent in the trade in a column called Size, but displayed it as 400K or 1.2M. I decided to paste the data into a google spreadsheet and use a formula to transform the size of the bet from a string into a numeric value I called Spent:

=IFERROR(LEFT(G2,LEN(G2)-1)*CHOOSE(MATCH(RIGHT(G2,1),{"K","M"},0),1000,1000000),G2)

Challenge 4: Storing the Data

After I cleaned the data, I wanted to store it on my local machine instead of in the google sheet. Since I use a lot of SQL in my day job and am quite familiar with it, I decided to load the data into an SQL database. I prefer using Microsoft SQL Server and Management Studio since that is what I’ve used the most:

Downloading a Google Sheet as CSV
  1. Select Tasks > Import Data. An Import/Export Wizard displays.
  1. Browse for the downloaded CSV
  2. Click Next
  3. Verify the columns look as expected
  4. Choose a Destination: SQL Server Native Client 11.0
  5. Verify the Server name and database are correct
  6. Click Next
  7. Rename the Destination Table if desired
  8. Click Next
  9. Click Finish
  10. Click Finish. You’ll see a the data load.

Eric Kleppen

Written by

Software Product Analyst in Data Science. Love for writing, science, data and self-help. Top writer in Business https://www.linkedin.com/in/erickleppen01/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade