Sourcing intra-day crypto data

Louis Brown
Coinmonks
4 min readApr 8, 2021

--

Photo by André François McKenzie on Unsplash

Introduction

I’ve recently been working on a project to build an intraday predictive model for Bitcoin pricing (bitcoinpredict10). The model was built on a large historical one-minute price dataset (sourced from Kaggle) which gave us lots of flexibility and, as such, meant that the final model required approximately two weeks of one-minute price points to run a forecast.

The data we believed would be easy to source in the production model as there are a number of vendors and exchanges who provide crypto data points. However, we found out after release that our required production dataset is much harder to find or can be costly for users and as such, we thought other people might have come across the same issue, finding slightly longer free datasets which are up to the minute.

The Solution

This article, will discuss how we have solved our issue using free exchange data from Bitstamps public API (most cryptocurrency exchanges provide similar APIs) and the website CryptoDataDownload. The example code will be written in Python and can be found at the following repository. To connect to both sites and manipulate/store the data, we will use the following libraries:-

  • pandas — using panda DataFrame to store and manipulate the data.
  • json — Bitstamp sends its data in a JSON file.
  • csv — the larger dataset sourced from CryptoDataDownload will be returned in a CSV file.
  • ssl — CryptoDataDownload uses secure sockets, so we will need this to connect to the site.
  • requests — sends queries to both websites APIs.

For this example, we will be calling the currency pair BTCUSD (Bitcoin and US Dollar). On the CryptoDataDownload site, we will source data from Bitstamp exchange for the BTCUSD pair.

The first part specifies the data location in the second step, we create a secure socket for connection, and in the last step, we source the data skipping the first row as this has the website address. Let’s have a look at the DataFrame hist_data.

The dataset contains the Unix timestamp, date, symbol, the open, high, low and close price for that minute and the volume of BTC traded and the volume of USD. This dataset is called Open High Low Close (OHLC).

Next, we source the Bitstamp data by calling the API linked to OHLC, getting 1-minute tick data (the step parameter) and limiting the dataset to 1000 data points (the limit parameter), calling the BTCUSD.

The code specifies the website location and parameters; we call it using the request.get and load the JSON file using json.loads. We then move to the correct JSON level using the ‘data’ and ‘ohlc’. It is then loaded into a DataFrame with the specified columns. Shall we see what latest_data looks like:

We can see a few data differences between the two DataFrames, so we will need to create new columns, rename column names and move column positions to match the datasets. The code below does these steps.

The code steps:-

  • In the first part, we create the new columns in latest_data;
  • then we rename the columns in this DataFrame;
  • next, we select the latest 20160 (2-weeks) of data points in the hist_data and drop the symbol column;
  • we then rename the columns to match latest_data (Bitstamp);
  • We then get the earliest data point in latest_data and remove all data points after this from the hist_data (CryptoDataDownload) and reverse the hist_data (now called dat) data order as it is originally the most up to date data at the beginning. In contrast, Bitstamp, the most up to date data, is last.

Finally, we need to smush the data together (aka join the two datasets); this is done using the append command. Lastly, we select the final two weeks of data for the combined dataset’s using the tail command.

The final output looks like this:-

Conclusion

So what was this all about? In this article, we have learnt how to source OHLC data from two separate websites and combine them. The combined dataset can be used in production models for possible trading strategies or applications such as trends analysis for cryptocurrencies. The final function we have called GetData_Smush.

Also, Read

--

--