The Top 100 Most Influential Blockchain Tweeters
Data Science @ Visokio
It’s 2019. Not a day goes by in which I don’t wake up, eat some cereals, open up the news and read something new about what’s going on in the blockchain world. Over the years I have loosely followed new inventions, trends and emerging crypto-currencies, but if pressed I could probably not say much about who is involved, the differences in technologies, and currencies, public opinion, or the connections and relationships between all the various entities and people.
It’s time to dig into it!
There are lots of interesting questions I want answered, but it’s a good idea to start with the basics. Some time ago I discovered a (by the looks of it hand-picked) list of the most influential people in the blockchain world. It included illustrious persons such as Paris Hilton. I always wondered how this list was derived in the first place and how Ms. Hilton made it onto it. I’m a scientist by trade as such I think I’m naturally interested in quantitative analyses. To me, this list did not look like the result of one.
Time for a new list!
Before going into details of how this list was derived. Here it is:
Every type of analysis requires data. We don’t have any. What data would even be good? There are lots of companies who deal directly with users and their actions — cryptocurrency exchanges are just one example. Their data however are hidden away and won’t necessarily yield the right information for our questions in mind.
Twitter on the other hand is a vast source of relatively free information which, using the right tools, we are able to harvest and analyse. Tweets are by no means without problems. They suffer from short texts, cryptic messages, bad spelling, emojis and urls that complicate quantisation. In short, there will be lots of noise, but also a potential trove of useful information such as opinions and relationships.
Now that we have identified a potentially good source of data, how would we get it? The twitter API offers a couple of ways. A simple keyword search will yield some hundred results, most of them extremely recent. That’s by no means enough data to do anything useful. The twitter stream / firehose on the other hand will push all tweets for certain keywords into our system — however, there is no way to get historical data nor related tweets which don’t contain the right keywords.
How can we get historical and related tweets? Through user timelines!
Twitter allows the download of the last 3200 tweets a user has written. Dependent on how often they tweet, these tweets can range months or years into the past. Now, from which users shall we download their tweets? Ah, that’s an interesting question. We cannot simply download from all users as we would waste time and resources. Instead, it’s better to define an initial list of people tweeting about blockchain and then to subsequently extend and refine it.
How to extend and refine? Easy! We are looking to get the most bang for the buck, so we should only download from tweeters worth our while. Impact is what we look for so it makes sense to download from the most impactful people first.
The workflow starts with the hand picked list I previously talked about and non-discriminatorily downloads all the timelines. Because not every crypto tweeter exclusively tweets about blockchain, we accumulate lots of unrelated noise.
The next thing we need to do is to separate the wheat from the chaff.
All tweets containing at least one blockchain related term are sorted into a bin representing a blockchain Gold Standard. The terms we have chosen are (in no order):
blockchain, cryptocurrency, ethereum, multisig, bitfinex, ethfinex, coinbase, litecoin, zcash, dogecoin, segwit, hard fork, altcoin, stablecoin, bittrex, smart contract, hyperledger
This list is nowhere excessive but we were fairly certain that it wouldn’t contain terms used in non-blockchain related context. All the remaining tweets might still contain useful blockchain related discussions. They will be further analysed.
Using the Gold Standard we create a blockchain topic model. With the topic model we categorise remaining tweets into blockchain related topics. This will produce lots of false positives when non-blockchain tweets by chance share similarities with Gold Standard topics. It’s possible to define a categorisation fit, for example by calculating a weighted average of the number of terms shared between a topic and a tweet, weighted by the importance of a term within a topic. The distribution of this fit typically follows an L-shape:
By defining a cut-off threshold at the point where the fit stagnates and disregarding all tweets below that threshold we minimise the amount of false positives.
The collected tweets contain a trove of usable information.
Information which can be used to create a network of tweeters for subsequent analysis. We create the network by making three observations:
- Whenever user A mentions user B in their text, it’s a directed network edge from A to B.
- Whenever user A replies to user B, it’s an edge from A to B.
- Whenever user A retweets a tweet from user B, it’s again an edge from A to B.
The weight of the edges is simply the number of times such a connection has been observed.
We have the network! It’s time for graph analysis.
We’re looking for important vertices (tweeters) now. There are many ways to do that, generally we’d be looking for something that determines the centrality of nodes and then use that to calculate a score for each of them. We opted to perform a simple eigenvalue analysis, which is essentially a page-rank computation where instead of websites we use tweeters.
After the analysis we are left with a ranked list of tweeters. Their distribution again follows an L-shape, therefore it makes sense to define a threshold after which the ranks start to stagnate. The reduced tweeter list forms the first iteration of the tweet gathering process. We download all timelines from the newly established list (ignoring tweeters from which we already have all their tweets) which yields another, more precise, window into the blockchain twitter world.
The list doesn’t change anymore, the process has converged.
At some point we have downloaded all available tweets from all impactful people spanning the last six months. It doesn’t mean this is the definitive ranking of all people involved in the blockchain sector, it’s more a view into a specific window in time, which after continuing to download tweets into the future, will surely change in one way or the other. What do these people tweet about, and in which topics do people tweet about them?
Use the dashboard below to select individual tweeters to look at what they are tweeting about and what is tweeted about them.
Notably, there is a twitter account that seems misplaced: Youtube. And indeed, the youtube twitter account is not tweeting much about blockchain, however, people reference it a lot discussing blockchain technologies and crypto-currencies, as it is a vast resource for informative material about these topics. In that sense, it’s indirectly highly influential indeed.
It’s not the end of the road. Now that we have accumulated a sizeable number of tweets and are able to download more we can look at the network of blockchain tweeters, their relationships, and affiliations. So stay tuned!