Analyzing CoinMarketCap data with Neo4j

8 min readMay 19, 2018

For my latest weekend project, I dove into the data of CoinMarketCap (cmc) to see if I could get something new out of it. This article is the compilation of what I found out on Friday.

When you’ve started investing into cryptocoins, there are a vast number of sources to get your data from: Exchanges, twitter, medium and sites such as cmc. The latter aggregates data for you in a comprehensive site. It ranks the coins on Market Cap, it can sort and filter coins in a couple of ways and it shows where to get them if you want to buy some. The majority of use cases is there.

My use case

However, I found one use case is missing: “What’s the value of my <Amount> <coinName1> coins expressed in coin <coinName2>?”. If there are sites that implement this use case, it’s a simple calculator that only lets you calculate the current value. And most of the time, the values cannot be calculated because the calculator only calculates between a handful of coins.

So, I wanted to find out if it is possible to aggregate this data from cmc and to see its usefulness in analyzing a trend when buying/selling between coins… if possible. (Of course it is, else you wouldn’t be reading this blog)

The ultimate goal for these exercises is to develop my fast prototyping web app and implement more features to quickly write an information-based application with preferably no coding.

The plan

The plan was simple (it should always be simple):

Collect data from cmc
Develop a metamodel
Write some queries
Visualize the data
Specify a list new features to build in my web app (not going to share)

Points 1-3 were on repeat to gain new insight. And at the recorded time, BTC was fairly stable (<1% gain/loss).

Execution

As a base, I made a copy of my fast prototyping web app. It’s a PHP application based on symfony that enables me to create meta models, input (semantically structured) data and run and view queries right away inside the user interface.

1.a Collect data from cmc

In order to collect the data from cmc, I went to the cmc api page and quickly found out it returns json. These couple of lines make the call and fetch the data:

$ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, ‘https://api.coinmarketcap.com/v2/listings/');
 curl_setopt($ch, CURLOPT_HTTPHEADER, array(‘Content-type: application/json’)); // Assuming you’re requesting JSON
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);$response = curl_exec($ch);
 
 $data = json_decode($response);

This was implemented in a symfony route for easy debugging. I was able to quickly import this data into a “Listing”-label. Resulting in 1602 listing. But the second time I read them all, there were only 1593. Huh? Had I lost some? After checking cmc, this number was found to be correct, so I concluded it’s not useful to import this data because…

There’s also the Ticker data api, which has the same data in it + quotes on how much the coin is worth expressed in the value of other coins. This is the ticker data for BTC with a quote in USD (fields in italic were parsed):

"1": {
            "id": 1, 
            "name": "Bitcoin", 
            "symbol": "BTC", 
            "website_slug": "bitcoin", 
            "rank": 1, 
            "circulating_supply": 17008162.0, 
            "total_supply": 17008162.0, 
            "max_supply": 21000000.0, 
            "quotes": {
                "USD": {
                    "price": 9024.09, 
                    "volume_24h": 8765400000.0, 
                    "market_cap": 153483184623.0, 
                    "percent_change_1h": -2.31, 
                    "percent_change_24h": -4.18, 
                    "percent_change_7d": -0.47
                }
            }, 
            "last_updated": 1525137271
        }

Having concluded that this is the (only) data to extract, it was time to develop a data model.

2.a Develop the Metamodel

The base model looked like this:

On top of the Price, Volume_24h, Market_Cap, last_updated, the QuoteTick has a FetchTime (time at which the API call was made) to ensure I could match Quotes fetched in the same timeframe. The Listing contained a minimum: Name, Symbol, ID and Website_slug. Also fetched from the ticker data, so it is created on the fly.

3.a Write some queries

The query for writing one listing looks somewhat like this:

MERGE (curr:Listing {domain:"CryptoSpace", in_id:{cid}, id:{cid} })
ON CREATE SET curr += {name:{name}, symbol:{symbol}, website_slug:{slug}}
with curr
MATCH (btc:Listing {in_id:'1'})
MERGE (btc)<-[:quotedIn]-(qt:QuoteTick {domain:"CryptoSpace", in_id:\{$tickid}, name:{$tickid})-[:for]->(curr) ';
ON CREATE SET qt += {FetchTime:{fetchtime}, UpdateTime:{updatedtime}, Price:{price}}

As you can see, the Listing every coin is quotedIn is BTC. It’s possible to extend this, but that didn’t serve any purpose in this exercise.

(The domain, in_id and name might look obsolete but they help me in the fast prototyping bit. My app can have different meta-models in the same graphdb that don’t intersect if they mustn’t and do if they must (in cross-domain meta-models). And if done via the app, these are checked restrictions.)

1.b Collect (more) data from cmc

Now everything was tested, so I could start to import more data. So, I rewrote the data collector to fetch all ticker data for all listings. This was done by calling the ticker call as many times as needed. Since there were 1593 listings, that meant I needed 16 calls per timeframe.

private function getTickerData($startAt)
 {
  $curlurl = "https://api.coinmarketcap.com/v2/ticker/?convert=BTC&start=".$startAt."&limit=100";
...
}$fetchtime = round(microtime(TRUE));
  $data = [];
  for ($i = 0; $i < 16; $i++) {
   echo 'fetching data ('.$i.')...'.PHP_EOL;   
   $data[$i]=$this->getTickerData(($i*100)+1);
}

The code above is the quick-and-dirty result. If this needs to be implemented correctly, you can fetch the number of listings from the “global”-call or (less favorable) from the first Ticker-call metadata and adapt the for loop dynamically. The data is not immediately processed, because I have to fetch it in the shortest time possible. Why? Because it might be that the data is updated. The following scenario could occur:

tickerdata 100 is fetched… coin = ETN
On cmc, data is updated and ETN is now rank 101, while I’m processing line 100. The coin listed previously on rank 101, now has moved to rank 100.
tickerdata 101 is fetched… coin = ETN :-/

So to minimize the risk of that occuring, I’d like to fetch the data in the smallest timespan possible. Hence the condensed loop.

2.b. Develop the metamodel

In order to get the result (expressing the value of a coin in another coin) a Bag is needed with an Amount of a specific Listing. And a TradeWish will express our desire to trade a Bag for one of another Listing:

3.b Write some queries

With this metamodel in place, we can input our bag. Let’s say I have 1000 QSP. And I’d like to trade it for ETHOS, so let’s make a TradeWish for that as well:

The Bag and TradeWish as defined in the prototype webapp

To show how much ETHOS we can get for our QSP, the query is:

MATCH (n:QuoteTick) with max(n.FetchTime) as latest
MATCH (toTick:QuoteTick)-[:for]->(to:Listing)<--(tw:TradeWish)-->(b:Bag)-->(from:Listing)<-[:for]-(fromTick:QuoteTick)
WHERE tw.in_id='TradeWish5aff195f41c793.57127888' AND toTick.FetchTime=latest AND fromTick.FetchTime=latest
Return (toInt(b.Amount)*fromTick.Price)/toTick.Price as AmountTo

What happens here is:

Take the last time data was fetched
Get the two latest QuoteTick connected to the TradeWish (one from the Bag and one that you want to trade to)
Calculate and return the new price

(I’ve used the unique ID (in_id) of the TradeWish here)

The first result? 72.08783464464412. So I can get about 72 ETHOS for my 1000 QSP.

More interesting is that a similar query can be used to show the Amount progress over time:

MATCH (toTick:QuoteTick)-[:for]->(to:Listing)<--(tw:TradeWish)-->(b:Bag)-->(from:Listing)<-[:for]-(fromTick:QuoteTick)
WHERE tw.in_id='TradeWish5aff195f41c793.57127888' AND toTick.FetchTime=fromTick.FetchTime
Return fromTick.FetchTime as Timestamp, (toInt(b.Amount)*fromTick.Price)/toTick.Price as AmountTo
ORDER BY Timestamp desc

4. Visualize the data

With a slightly modified query we get the data and plot it like below:

When you want quick visualisations, Excel is an ideal playground. :-)

The graph shows that if you would have sold at May 19th@2am instead of May 18th@20pm, you would have gotten 7% more ETHOS for your QSP! Whoa, that’s a difference I would want to be aware of!

Intermezzo — more possibilities

Top 10 matrix

How about making a matrix of the Top 10 coins and express their value in eachothers coin? Here you go:

Top 10 coin value expressed in eachothers coin (read: 1 IOTA = 7.29… Cardano)

And yes, the real value of this only becomes apparent when you a) want to trade from one coin to the other and b) you see the %gain/loss expressed in the combination of two coins. Given that it took me under 5 minutes to produce the query (which runs <500ms unoptimized) and render the result, that seems totally feasible to me.

Portfolio based balanced switching

Another interesting question to address would be: “This is my portfolio [coin A,B,C + amounts] and I want to buy 1000 of coin D. How many of which coins should I sell to do so?”. This would aggregate results of multiple queries into one weighted advice. It’s possible, but would need a little more than a weekend fiddling around.

Correlations with (Real-time)Volumes and Social media attention

Something to find out is if there are correlations between the social media attention coins get and the value variance between those two coins. That would bring some more knowledge to the table, but its definitely worth investigating.

Spotting low cap high potentials

Something that’s bothered me in sites that list coins is the restricted way in which selection, filtering and sorting can be done. There are definitely some trends here, but you can’t see them with the regular applications or websites. This raw data gives the opportunity to make selections of (for example) coins with a low marketcap that gain volume.

Conclusions

What’s interesting is that this data showed that something can be derived from this data that is not (freely) available anywhere in the field today: How two random coins compare to eachother expressed in time. While the comparison is done via a common denominator (BTC), that’s the path you would normally follow if you wanted to swap coin A for coin B on an exchange. The question ringing in your head would be: Is NOW a good time to do so? And you wouldn’t be able to answer.

With a a little work, a lot of patience (gathering data points) and some creative query writing, I’ve been able to create an answer to that question for ANY combination of coins by fetching (freely) available data. In one evening.

And there’s more: Write more queries! (It’s addictive)… and: The user interface can be made more easy to use, directly plot graphs in the web app and to allow multiple perspectives in one screen… to name a few.

If this pops up in my portfolio app I’d be a very happy man. :-) But for now, there’s a load more to be done with this data, so: See ya!

QUESTION: What question would you like to be answered with the data from cmc? Let me know!

Disclaimer: While I do have some knowledge of cryptocurrencies and blockchain, none of this is to be interpreted as investment advice or a thorough analysis. The content seen is fictional and purely created to explain how to use Neo4j in leveraging data from cmc.

Disclaimer 2: I didn’t optimize anything and did that on purpose. And I made typo’s or probably mismatched queries with results/the meta model/something else. I’m not perfect, went for fast results and got them. Live with it.