On Tuesday we announced our partnership with eToro, the world’s largest social trading platform, to develop and commercialize sentiment driven trading strategies for their twelve million customers around the globe. This article serves to compliment that announcement, to delve deeper into what sentiment is and our patented sentiment scoring process, and to address relevant questions and concerns brought up by members of the cryptocurrency community about our new offerings.
The structure of this article is question and answer. We will be actively updating it as we receive more relevant questions from the community.
Crypto Twitter and Market Sentiment
Are cryptocurrency conversations on Twitter manipulated?
Yes, conversations on cryptocurrencies are extremely manipulated. In our research we have found that over 90% of conversations around digital assets on Twitter come from inaccurate, manipulative, or fraudulent users.
The challenge is, how do you separate signal from noise and develop technology to detect and eliminate this dishonest activity? Enter Social Market Analytics (SMA), a founding investor in The TIE and a leading provider of quantified sentiment data on equities, futures, forex, and ETFs to the world’s foremost financial firms. Over the last eight years SMA has developed and patented account filtration, manipulation detection, and account accuracy technology to eliminate noise and provide clean and actionable sentiment data feeds for institutional investors. The TIE has partnered with SMA to combine The TIE’s domain expertise and relationships, with SMA’s proprietary machine learning and natural language processing technology, to develop sentiment solutions for the digital asset market.
SMA’s engine is able to detect the accuracy of individual users tweeting about specific assets on Twitter. When John Doe makes positive posts about Microsoft or Ethereum, SMA can detect what percentage of the time those assets appreciate within a day, a week, or a month. Assessing account accuracy enables the removal of accounts which have historically been poor predictors of asset movement and can help identify and segment professional investors.
Another interesting way to assess manipulation is to consider the dispersion, or the percentage of tweets coming from unique Twitter accounts. When the dispersion is extremely low, or in other words a very small percentage of tweets are coming from unique users, that is typically a sign of a collective group of accounts posting frequently about an individual asset and often trying to manipulate conversations to influence price movement.
Assessing the age of an account, its followers to follows ratio, and its posting history, further assists in the removal of dishonest activity. If the same user is posting exclusively about the same asset and sharing similarly positive or negative messages in bulk, it is unlikely that their posts will have any effect on the movement of an asset.
When account filtration is completed over 90% of digital asset tweets and accounts are flagged and removed before market sentiment is assessed.
What is sentiment and how does The TIE convert millions of tweets into actionable information?
Sentiment is a quantified representation of investors future intentions. It doesn’t matter that John Doe is happy that Bitcoin is up this month, what matters is that he is bullish on the future of the asset, and that collectively investors sentiment (or feelings) towards an individual asset during a given time period are more positive than over a previous period.
Sentiment is scored in a four-step process. SMA ingests the full Twitter Firehose — the real-time stream of more than eight-hundred and fifty million tweets per day available exclusively to Twitter partners. Once the firehose is ingested, the first step is to assess the relevancy of a tweet to an individual asset and extract and bucket all tweets related to the same topic. At first thought this may seem extremely intuitive — a tweet that mentions ETH should be relevant to Ethereum. The problem? There are over 80 overlapping tickers symbols between equities and cryptocurrencies. While ETH is the ticker for Ethereum, it is also the ticker symbol for Ethan Allen, a publicly traded American home furniture company. NEM is the ticker symbol for the cryptocurrency NEM, but it is also that of Newmont Goldcorp. However, the problem isn’t limited to just ticker symbols. Someone tweeting about XRP may refer to the coin as Ripple, but ripple is a generic term. Another user may tweet about Apple’s declining revenue having a ripple effect across the NASDAQ more broadly.
As a result, custom topic models were developed to solve for these issues across asset classes. While much of this process is automated, a level of human supervision is needed on a daily basis to ensure that irrelevant tweets are not being picked up in the topic model for an asset. Context of a tweet and the historical patterns of individual users must be considered when assessing relevancy. This patented process helps remove over 99% of irrelevant activity.
The second step is the aforementioned account filtration. After the firehose is ingested and every tweet relevant to an asset is bucketed together, manipulative posts must be removed. During this step over 90% of tweets are filtered out.
The third step is calculating an investor’s sentiment on an individual tweet. A dictionary of over 100,000 unique terms has been developed to help with this assessment. Each word in each tweet is individually scored using SMA’s proprietary natural language processor using machine learning technology. The scores of each word are aggregated and each individual tweet is given a score. An example tweet is provided below.
In the example above the word’s surge and buying are both scored positively and an overall positive score of +.4 is given to that tweet.
The final step is turning these scored individual tweets into quantified and actionable intelligence for traders and investors. When quantifying sentiment, it is critical not to compare the conversations on an individual asset vs. those of another asset. The reason this is important is because on average certain cryptocurrencies may have more positive conversations than others. For example, XRP has a large community supporting the asset on Twitter and on average conversations on it are much more positive than those on Bitcoin.
When scoring sentiment, we instead compare the conversations on a particular asset over one period vs. the conversations on that same asset over another period. For example, we may look at how positive Bitcoin conversations are today vs. the last seven days. After normalizing the data, we may find that Bitcoin’s sentiment is two standard deviations more positive today than it was this past week. By normalizing the data and comparing sentiment over look-back periods we can identify when an asset’s conversations are becoming increasingly more positive or negative. Our research has found significant evidence of The TIE & SMA’s quantified sentiment data having predictive power over digital asset price movement.
From the time a tweet is sent over Twitter to the time it is extracted, filtered, scored, and made available to institutional customers via our API is 300 milliseconds. Over 20 unique and actionable sentiment metrics are generated in near real-time.
Where can I read about your research and view back tests on your data?
Here is a small sample of some of our research and quantitative models that we have built:
TheTIE-LongOnly CopyPortfolio on eToro
How does the allocation process work? Are you just buying cryptocurrencies with the most positive conversations on Twitter?
TheTIE-LongOnly CopyPortfolio employs a purely algorithmic allocation process, no human intervention is involved when deciding whether to allocate to individual assets. There are four key terms and two factors that are part of the allocation process. The first term is Raw-Sentiment which is the sum of the sentiment of every conversation on Twitter about a particular cryptocurrency on an individual day. The second is Poster, which is the number of unique twitter accounts positing about an asset on a given day.
To develop the CopyPortfolio we normalized the values of both of these data points to create the two factors in our model. The result is a Raw Sentiment Score, which compares the Raw-Sentiment on a coin over the last month vs. the previous six months and a Poster Score which compares the number of users tweeting about a crypto over the last month vs. the previous six.
In order to reduce the burden of spreads on users, we chose to rebalance the CopyPortfolio once per month and thus developed the aforementioned factors as monthly indicators.
Through significant testing we identified an optimal four-step process to determine constituents and allocations. The first step is to determine whether an individual cryptocurrency’s Poster-Score is greater than one standard deviation from the mean. By doing this we are looking to see if there were significantly more twitter users discussing an asset over the last month than the previous six.
The second step is to identify where the Poster-Score is greater than 1. If Poster-Score>1 the value of the Raw Sentiment Score is inverted. The reason it is inverted is because we have found that when the number of users discussing an asset and sentiment are both increasing, the value of the buy signal is no longer valid. On the other hand, we have found that upwards price movement tend to follow when there are significantly more Twitter users discussing a cryptocurrency and that conversation is negative.
Third, we look to see if there are three or more cryptocurrencies with Raw Sentiment Scores greater than 0. If that is the case, we weight their allocations based on the absolute values of their raw sentiment scores.
Fourth (optional), if there are less than three cryptos with Raw Sentiment Scores greater than 0 we select the three highest cryptos by Raw Sentiment Score and weight the highest 3/6ths, the second highest 2/6ths, and the lowest 1/6th. The reason that we maintain a minimum of three positions is to reduce the risk relative to holding an individual asset.
What are the coins that you consider allocating to?
We evaluate a basket of thirteen cryptocurrencies for inclusion each month. These assets were selected because they were the digital assets available globally on eToro when the strategies were initially formulated. It is possible that as few as three and as many as all thirteen are included in the CopyPortfolio each month. The full list is available below:
What is the historic performance of the long-only strategy?
In a combination of back tests and live trading the CopyPortfolio has returned 213.7% after fees since October 2017 (as calculated by eToro) vs. a 41.1% return for Bitcoin and a 29.7% return for a monthly-rebalanced equally weighted basket of the same assets after fees. You can see monthly breakdowns of allocations and performance on eToro. The CopyPortfolio has also achieved a significantly better sharpe ratio (a measure of risk vs. reward) relative to Bitcoin and an equally weighted basket of the same assets.
What are the risks involved with copying TheTIE-LongOnly CopyPortfolio?
Whenever making an investment decision it is critical to do your own research and understand the associated risks. Cryptocurrency is a highly volatile asset classes and intra-day swings of up to 10% on an asset remain commonplace. While the long-only strategy in back testing has historically outperformed a benchmark of the same underlying assets equally weighted, past-performance is no guarantee of future returns. As this strategy allocates exclusively to long positions within the cryptocurrency market it is exposed to downside risk if the overall market is not performing.
I am interested in TheTIE-LongOnly CopyPortfolio, where can I learn more?
The TIE is the premier provider of alternative data solutions for institutional digital asset investors. Through internal development, exclusive partnerships, and strategic acquisitions, we have built a suite of nine proprietary data sets for the next generation of digital asset investing. From corporate actions/significant developments, to sentiment, to employment data, our suite of data feeds has been developed to service a range of institutional use cases including quantitative trading, compliance, sell-side research, and discretionary event-driven investing.
Our Commitment to Ethical Business Practices
Since inception, The TIE has committed to a strict ethics policy ensuring our independence and the integrity of our data and offerings. Each employee and significant shareholder is required to abide by a firm set of principles and professional conduct policies exceeding industry standards.
In an industry rampant with misinformation, scams, and dishonest conduct, it is critical to operate in a transparent manner with clear ethical guidelines. Earning and maintaining the trust of our users and customers will always be at the core of The TIE’s mission.
To learn more about our strict ethical standards click here.
Institutional Data Solutions Inquiries
If you are interested in learning more about our institutional data offerings you can visit this page. To request a demo of any of our data offerings or to speak with us about our solutions please send an email to firstname.lastname@example.org and our team will be more than happy to assist you.