Building a crypto application using Polars (part 1): Retrieving historical data from Binance

Published in

Coinmonks

6 min readSep 22, 2022

In this multi-post series, we will be building a crypto application that has the ability to retrieve data from different exchanges, implement some technical indicators on top of that data, and possibly even deploy the solution in the cloud.

The inspiration for this series is based on my interest in the field, along with the blogpost by Peter Nistrup that can be found here. In his post, data is retrieved from Binance and Bitmex in a notebook using Pandas. Even though that works as it does, I wanted to make it more robust, production-ready and especially faster by using Polars. Using this library, I was able to increase the speed of the application significantly, which is crucial since every mili second counts when you are trading real-time. Therefore, the main focus of these posts will be focused on the use of Polars, while the rest of the code can be found in the repository.

Prerequisites

In order to actually get the data from the exchange, you need to have a Binance account. You can register via this link. To follow this tutorial you do NOT have to purchase anything or spend any money.

With an active Binance account, we need to create an API Key to allow the application to use your account to read the crypto data from the exchange. Make sure to only give the API key the permissions to read, such that if ever your key is somehow exposed, nothing can be purchased with it. Here you can find how to create the API key.

I added a requirements.txt file, which contains all the python libraries necessary to run the code. You can quickly install this by running the following line in your virtual environment:

pip install -r requirements.txt

When that is done, we can start coding!

In this first part, we will be focussing on retrieving all BTC pairs from Binance and writing them to disk. The code used in this post can be found here.

The structure for the repo looks like this:

The two most important files here are:

main.py
orderbook.py

We also need to use the Binance API Key. To do that we use a configuration file called “config.ini” where we can store the credentials locally. The method we are using is not the most secure and is not recommended in a production setting, but for a simple local post it’s fine.

config.ini

main.py

Now that we have that setup, we will create an orderbook class which will contain the code to connect to an exchange, fetches the data, processes it and writes it to disk. The idea is that this orderbook can handle multiple different exchanges, but that is something for the future posts.

Asyncio

Since we are going to call the api multiple times, we have to take into account rate limits. The more data you request from Binance, the “heavier” the request is on their systems. The value of the rate limit for Binance is 1500 per minute. Requesting data of the pair BTC/ETH for the last day can for example cost 3, while retrieving data for the whole of 2022 can be 30.

To increase the speed of the api calls, we use Asyncio to concurrently make calls and fetch data from the API. Next to that, we implement a precaution to make sure we don’t go over the rate limit.

This is what the final main.py will look like for this post.

We create an asynchronous function called main, which is the entrypoint of the application. We initialise the orderbook, and pass the value against which base symbol we want to retrieve the data. Passing “BTC” will mean we will retrieve data for all hundreds of coins that are paired with BTC such as “ETH/BTC”, “XRP/BTC” etc. Furthermore, we define on what timeframe we want to retrieve the data, in this case, every hour.

The next step is to create the client that will make the call to the Binance API. We explicitly wait until a client is returned, otherwise we can’t get any information. Here we pass the credentials that are stored in the configuration file.

Now in the “get_orderbook” function, most of the logic is present. Explaining everything in there could make for a whole separate post itself. I believe the more interesting functionality will be in the next part where we actually work with the retrieved data. So for that reason I will skip that part of that code, where we process the data using Polars.

Using polars

In the gist above, we retrieve the data from the api using the “_get_klines_data”. In a previous step, the oldest stored datapoint, and the newest available data point have already been determined for a specific symbol. Next, we create a polars dataframe, passing in the data and the name of the columns.

One of the advantages of using the Polars Expressions API, is that in a single select, we can perform different actions. Not only do we cast all the columns to the correct type, we are also able to create a new column. The reason for the column is that within the dataset, we want to store the name of the coin. This is useful when we later combine all the different datasets together and use that column as a filter.

The next step would be to actually write the data that we just retrieved. In order to do that, we first have to check if we already have some data stored. If that is the case, we need to append the data and we should exclude the header. The way to append two dataframes together, can be done by using “vstack” in polars.

One of the reasons to chose polars for this project, is that reading and writing csv’s in Polars is way faster compared to Pandas. Let me show you an example:

Pandas

Polars

Some context on the amount of data for this test:

Data from 01–01–2020 until 21–09–2022
Number of rows: 23848
Filesize: 3.1 MB

While the speed for reading in the data in Polars is roughly 1,5 times faster than in Pandas, the biggest difference we see is in the writing. Writing in Polars is +- 16 times faster than in Pandas. The speed gain might not seem so impressive now, but if you consider the fact that this is for only 2,5 years of hourly data for 1 specific coin, imagine the performance improvement if you do this for a longer period and for all the +- 400 BTC pairs available on Binance.

Conclusion

This first part of the series was more about setting up and getting the data that we are going to use in the next series. With the code from this post, it is possible to fetch all data from the Binance exchange with 4 lines of code.

The second post we will implement some technical indicators to use on the data using Polars.

I will add the link of post here as soon as that second part is finished.