Leveraging alternative crypto data for quantitative trading: Social Signals and Sentiment Analysis

Vlad Ilnitskiy
Contora
Published in
6 min readApr 26, 2023

The world of crypto trading has evolved significantly in the past few years, with algorithmic trading strategies becoming increasingly popular among investors of different types. While traditional financial markets have long relied on alternative data for generating trading signals, the crypto market has unique characteristics that present novel opportunities for algorithmic trading use cases.

Today Contora team will explore the potential of quantitative trading based on alternative data, specifically social-related data and sentiment analysis. Over the past 2 years, we have communicated with many top institutions (investment banks, hedge funds, and trading teams) and managed to get on-hand feedback on how they incorporate complementary data for trading purposes.

Social Data vs. Price Correlation Mechanic

Let us briefly overview why social data itself can be related to price. By social data, we mean the number of coin mentions, the tone of these mentions, the number of unique users mentioning the coin, the main topic/message, etc.

Significant price fluctuations are always somehow connected with the news. If the news space spreads information about an important event that happened/will happen, the community starts actively discussing this news with a specific sentiment/tone. Next, after the wave of this news piece spreads further and further through the market, more and more members start doing some actions with the target asset — to buy, sell, or hold it.

Thus, for a successful trading strategy based on social data, it is crucial to see a splash of discussions at the very beginning. Here we face the most important leverage for quants to make money — the granularity of data, which gives you speed and informational advantage. The closer the granularity of the data is to real-time, the higher the opportunity to detect the wave and buy/sell the asset before the market acts. You snooze — you lose.

Reddit, Twitter, Telegram, and Discord mentions as a Trading Signal

As we mentioned earlier, social media platforms are one of the most potent sources of alternative data in the crypto sphere. Just knowing the number of how many times an asset has been mentioned, already allows you to keep abreast of the market and apply it in your strategy.

Let’s explore some examples:

$DOGE Reddit mentions and price correlation
$MATIC Reddit mentions and price correlation

Actual Correlations

As a next step of our research, we calculated the mathematical correlation between 2 data series. The correlation is measured from -1 to 1, the closer to zero — the worse. Those correlations which are greater than 0.5 are considered to be significant. By thoroughly exploring Bitcoin ($BTC) and Ethereum ($ETH), we discovered a strong correlation between the daily number of Twitter mentions and their future price movements. For Ethereum, this correlation is 0.6, while for Bitcoin it’s even more substantial and reaches 0.75.

To further investigate this relationship, we shifted the price data forward, effectively examining the predictive power of the social mentions data. Our findings suggest that the number of Twitter mentions may indeed have some forecasting power when it comes to prices. It is worth noting that correlation does not necessarily imply causation and further analysis is required to validate the robustness of this relationship.

Correlation for Bitcoin:

Lag: 0, Correlation: 0.6018426309383825
Lag: 1, Correlation: 0.6035511865968731
Lag: 2, Correlation: 0.6037786360981837
Lag: 3, Correlation: 0.6067515046675945
Lag: 4, Correlation: 0.6083981454485415
Lag: 5, Correlation: 0.6100346526996194
Lag: 6, Correlation: 0.613483344666267
Lag: 7, Correlation: 0.613330250368953
Lag: 8, Correlation: 0.6123564314650007
Lag: 9, Correlation: 0.6108238663960035
Lag: 10, Correlation: 0.6083920848312482

Correlation for Ethereum:

Lag: 0, Correlation: 0.6018426309383825
Lag: 1, Correlation: 0.6035511865968731
Lag: 2, Correlation: 0.6037786360981837
Lag: 3, Correlation: 0.6067515046675945
Lag: 4, Correlation: 0.6083981454485415
Lag: 5, Correlation: 0.6100346526996194
Lag: 6, Correlation: 0.613483344666267
Lag: 7, Correlation: 0.613330250368953
Lag: 8, Correlation: 0.6123564314650007
Lag: 9, Correlation: 0.6108238663960035
Lag: 10, Correlation: 0.6083920848312482

The Sentiment of Social Mentions as a Trading Signal

Beyond simply counting the number of mentions, sentiment analysis can provide a more nuanced understanding of market opinions. By analyzing the overall tone of the discussions about a particular crypto project, traders can gauge whether the sentiment is generally positive or negative. This data can then be used to create trading signals that capitalize on the prevailing market sentiment.

A potential use case that comes through: A quant trader might develop an algorithm that monitors the sentiment of social media posts for desired assets. If the algorithm detects a sharp increase in positive sentiment for a particular coin, it could trigger a buy order, while a decline in sentiment might result in a sell order.

Below are examples of correlations between an asset’s sentiment of discussions on social networks and its price:

$UNI Reddit sentiment and price correlation
$SOL Reddit sentiment and price correlation
$BNB Reddit sentiment and price correlation

Training the model for price prediction

As a result of our research, we decided to train the ML model on our dataset and check how well it predicts the price. We used a machine learning model called Gradient Boosting Regressor to predict future returns. The features used in the model include data from GitHub (number of commits/contributors), Discord, Telegram, Reddit, and Twitter. The model was trained to predict returns for future days (from 1 to 7 days ahead). For evaluation, we used the R² score and accuracy score as performance metrics.

The results indicate that the predictive power of the social data metrics improves as the prediction horizon increases. Specifically, the model performs better for longer-term horizons (5–7 days) than shorter ones (1–3 days). The R² score ranges from -0.525 to 0.498, with positive values indicating better performance. The accuracy score ranges from 0.464 to 0.694, with higher values indicating better performance.

We can make a conclusion that the numbers below demonstrate quite high potential returns and the predictive power of the data:

Today’s analysis showed that there’s a place for alternative data in quantitative trading strategies. We identified correlations between social metrics and price movements and overviewed some possible use cases. We also demonstrated how machine learning models can be used to predict future returns based on alternative data, providing an edge for algorithmic traders.

The data we used for showcasing the correlations and backtesting:

  • Contora’s Social Pulse — high-granular social-related and sentiment data for the top 100 coins by liquidity;
  • Contora’s CryptoRiskIQ — comprehensive data solution for risk assessment and due diligence, that includes social and development-related data with daily granularity.

P.S. Get in touch with us to see how we can help you enhance your trading strategies with the power of alternative data.

Want to see even more insights based on off-chain crypto/web3 data? Let’s make 500 claps here and another analysis goes live! You can actually hold down the clap button all the way until 50 to share your interest :)

--

--

Vlad Ilnitskiy
Contora

Sharing crypto insights based on social and technology-related data | Co-founder of https://contora.ai