Data Science Project: Cryptocurrencies (Part 1) — Motivation —

In this first article, I will try to answer the question: why are we here? (not the metaphysical one), and also give you a little taste of one of our first steps.

Mauricio Letelier
Coinmonks
Published in
6 min readMar 2, 2020

--

Photo by Abigail Faith on Unsplash (i was just wondering if…)

Cryptocurrencies: Volume and Data Source (Part 2).

Cryptocurrencies: Becoming a Trader Data Scientist (Part 3).

Becoming a Trader Data Scientist: Transforming Bollinger Bands (Part 4)

(Un)necessary introduction

There are a lot of articles that describe all the introductory context of cryptocurrencies, what blockchain is, how the market works, what is technical analysis, and so on. Given that there are brilliants descriptions of all of this and you can find excellent resources in the embedded words and in a lot of other places we will skip this part (at least for now) because we will jump straight forward to analyze one of the thousands of ways to tackle all the questions that trading involves like “should I buy?”, “when to sell?” etc.

So, you might be thinking, “ok, but there is a lot of articles applying all kinds of machine learning algorithms to cryptocurrencies too (mainly LSTM models) if the point is to avoid existing content you are not doing any better.” That is a long shot predicting what you were thinking about, huh! (COVID19 would be a safer choice), but in case I was right, let me explain it to you.

The discussion of this succession of articles will be focused more on the questions that I had as a newcomer to this problem than the models themselves (don’t worry, there will be a lot about the models though). I’m not saying that this will be completely original, but at least that will be one of the goals. I don’t pretend either to have the best answer, because as I mentioned before I’m a newcomer and because is hard to believe that exists such a thing as the best way to do anything. What I will be sharing here is the path of trying to solve this problem as the time I go through it.

Considering all of that, what you should expect to see in the next articles:

  • A lot of interior monologues (until the point of the stream of consciousness sometimes). This is because I want to avoid this usual third-person omniscient narrative point of view used in most of the academic articles, which makes me feel like “where did all that stuff came from?” and “that is the absolute truth then?”. As you can see, I will even show you the article I read to write this paragraph.
  • Me failing and being wrong A LOT! because this is more about the process than the final outcome. This means that I will not skip some stuff that might look obvious for you, and also, if your goal is just to learn in the minimum quantity of pages, you are not in the right place.
  • Also, these articles will have the same structure as House, M.D.
    What I’m trying to say with that: on House, every chapter is based on a new medical case, so you can know nothing about the TV show and still catch up with everything related to the case. But also, there is some background history about the characters, that if you didn’t see the previous chapters, you would lose. So if you don’t want to read the complete saga, instead you prefer reading just one specific topic (medical case), you will just lose the context of why we are at that point (why the characters aren’t talking to each other).
  • If you are worried that this seems more like a novel than a project about cryptocurrencies, Don’t Be! Obviously, I haven’t forgotten yet that this is a data science project, the discussion will be centered around all the common data science pipeline: data gathering, data wrangling, important features, feature engineering, feature selection, model selection, backtesting and more!.
  • It is a personal goal for me to publish every two weeks, starting today on Sundays. I would totally love having time to do this more often, but since I have a full-time job, a relationship, family, friends, and basically a life, I am pushing myself harder enough to make this commitment. It is worth saying that today doing Facebook scrolling, I found this video, and despite the religious arguments, it was really touching.

After this super unnecessary background, let’s finally start to taste some of the meat of the project (soy meat for vegans).

The first step of our journey: Data gathering, of course!

Retrieving the data may not appear such a challenging task, but in this particular scenario, it’s definitively something we must think twice.

My first approach was to look for a reliable exchange and look for its API documentation. My first Google search was “Best exchanges for cryptocurrencies” and I found a lot of rankings with a bunch of arguments of which one was the best suitable for every different specific goal, and Coinbase was in almost everyone.

Checkout best Crypto API for developers

The package for Python (yes, and besides of coding in Python I will be executing it in a Jupyter Notebook, so original!) that I found cbpro, seemed pretty ok to me: easy to use and also with the granularity parameter, which will be super useful when we have the discussion about investment horizons. After struggling a little with the parameters for choosing the start and the end date, I check out the API documentation, and there was the natural “start” and “end” parameter. So, for example, let’s see a simple call for the ETH-USD pair.

I also had fun with candlesticks (Plotly). Link over here.

I chose an hour granularity, and the candlestick reflects the price fluctuation of these 12 days.

This first step reveals one of our first “wait a minute”. So wait a minute! What if I wanted to retrieve the last month data with the same granularity. Well, this is what happens widening up to a month the period of the call.

Beautiful! A lovely generic error. After trying different things, I found out the actual error, surpassing the 300 data points maximum. This implies that we probably need to develop a method if we want to gather more data points. But when I was figgering out how to solve this problem BANG! wait a minute! Coinbase is just one of the multiple exchanges, and its volume could be extremely low compared to the entire market. Because of that, it could be not representative of subtle signals happening in other exchanges.

After grumbling for a while, and blaming myself for having these thoughts whose only makes me start all over again, I decided to go for it. At this point is when I found Cryptocompare. One of the features that this page has is a ranking of the volume traded in the last 24 hours for more than 200 exchanges. This is the exchange table ordered by volume of the first 7 markets:

Screenshot - Cryptomarket.com

Coinbase was not even in the top 20 (29 in that particular moment). So the intuition was right, the volume of the entire market is HUGE compared to just one exchange. The questions that arise are, “will be that really important in our analysis?”, “do we really need all the data?”. The answer is: let’s see.

But that questions, my friend (yes, if you read until here I declare you “my friend”) will belong to a whole new story (Medium story). I hope you grasped a little bit of what will be going on here, and if you like it, I invite you for us to meeting two weeks from now again, with all this super exciting debate of what really is a reliable source of information.

If you liked it, follow me on Medium and Linkedin. If you want to write to me, I’m recently on Twitter. I will gladly talk with you!.

Get Best Software Deals Directly In Your Inbox

--

--

Mauricio Letelier
Coinmonks

Chilean 🇨🇱 | Quant Finance 📈 | Azure Data Scientist Associate ☁️ | https://www.linkedin.com/in/maletelier 👨‍💼