Leveraging the Enigma Data Marketplace to Boost Catalyst Strategies

Aditya Palepu
Catalyst Crypto
Published in
16 min readJun 21, 2018

Whether or not you believe in the mantra “price is king” when it comes to devising trading strategies, the best traders have been dramatically increasing their alternative data usage to generate alpha and mitigate risk. From mega quant funds like RenTech to the casual investor, data points such as granular order book info, weather patterns, social media sentiment scores, and even sunspots/lunar phases are now features driving automated decisions. It’s becoming as important as ever to not only have access to this vast amount of data, but to seamlessly integrate what the data is telling you into more robust, informed trades.

By the end of this comprehensive guide, you will learn all the necessary steps for using the Enigma Data Marketplace for Catalyst, subscribing to individual datasets, and harnessing this data to develop highly profitable crypto trading strategies. There’s never been a better time to professionalize your crypto trading — so let’s get started!

What is the Enigma Decentralized Data Marketplace?

I won’t get very into the technical details of the mechanics here, although it’s truly innovative and exciting stuff, so I recommend you check out our whitepaper and the documentation!

For starters, the decentralized data marketplace is the platform layer sandwiched between the Enigma protocol and Catalyst application layers. Data providers and curators can publish and monetize their data to the marketplace with the assurance of security and privacy. On the other side of the equation, subscribers can pay to get access to the various awesome datasets available and start incorporating them into their strategies. It’s important to note that absolutely anyone can be a data provider or subscriber — the marketplace is welcome to all participants. Whether you’re a company, large organization, or individual, you can share or consume datasets.

Without further ado, I’ll get started explaining how you can use the data marketplace to create a powerful trading strategy!

Getting Started With Catalyst

If you’re unfamiliar with/haven’t yet installed Python 3 (Python 3 is needed to interact with the data marketplace), PyCharm, or the Catalyst package, or just want a nice refresher, please check out my prior post that walks you through that process.

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

Metamask and Account Funding

The easiest way to get started with the data marketplace is to use the Metamask chrome extension, an in-browser tool that allows you to seamlessly interact with the live, test, and development networks, manage identities, and sign transactions. Once installed, you will see a tiny fox icon in the top right area of your browser window. After clicking this you can create a new vault of accounts.

Now that you’re in, you will need to view (and fund) this wallet with ENG tokens since that is the token used to work with the marketplace. From the Metamask dropdown (by clicking the fox icon again in case the menu lost focus; you should still be logged in), select the TOKENS tab and click ADD TOKEN. This is where you add the following ENG token contract information and click ADD when completed:

  • Token Contract Address: 0xf0ee6b27b759c9893ce4f094b49ad28fd15a23e4
  • Token Symbol: ENG
  • Decimals of Precision: 8

Your ENG balance will now appear on the TOKENS tab of Metamask, and should say 0. You will need to fund this account with some starting ETH and ENG in order to work with the marketplace. The ETH will be used as gas to carry out the necessary transactions on the blockchain and the ENG will be used as the currency for paying for (or receiving value from) datasets. How you choose to fund this wallet address (can be obtained by clicking the three dots/ellipsis on Metamask > Copy Address to clipboard) with ETH and ENG is up to you — you can send ENG from a hardware wallet, online wallet, or exchange (can trade ENG/BTC from popular exchanges Binance and Bittrex).

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

Marketplace Initial Configuration

I know you’re excited to start using the marketplace already, but at the moment, it’s not yet configured. Any marketplace command will kick off this process, however. From within the catalyst conda environment (conda activate catalyst ), start off typing the command to list the available datasets: catalyst marketplace ls. This will result in an error like so:

$ catalyst marketplace lsListing of available data sources on the marketplace:Error traceback: $HOME/.../catalyst/catalyst/marketplace/marketplace.py (line 58)MarketplacePubAddressEmpty:  Please enter your public address to use in the Data Marketplace in the following file: $HOME/.catalyst/data/marketplace/addresses.json

Navigate to this addresses.json file with the command vim $HOME/.catalyst/data/marketplace/addresses.json and populate the auto-generated file like this:

[
{
"pubAddr":"<your metamask public address>",
"desc":"my_metamask_wallet",
"wallet":"metamask"
}
]

This file is critical to using the marketplace from here on out, and in fact will be further updated in the next step. The good thing for you though is that you never should have to open/modify/worry about it from this point onwards!

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

View Available Datasets

The first thing you’ll want to do as a subscriber is to see what datasets are available (also available online with some additional details at https://data.enigma.co/marketplace/status). You can achieve this via the command line in the following manner:

$ catalyst marketplace ls
Listing of available data sources on the marketplace:
dataset
0 enigma_marketcap
1 enigma_github
2 whalesupplies_twitter
3 whalesupplies_erc20
4 /r/cryptocurrency_data
5 kaiko-btc38-orderbooks-10
6 bitfinex historical data
7 google trends
8 reddit_cryptocurrency_data
9 napoleonx_btc_strategy
10 coinmarketcap_historical_data
11 coinmarketcap historical data
12 whalesupplies_ohlcv
13 izokay_bitcoinmarkets
14 izokay_coinmarketcap_historical
15 napoleonx_eth_strategy
16 coinmarketcap_historical
17 onchainfx
18
19 infotrie_sentiment_daily_all
20 infotrie_sentiment_daily_btc
21 infotrie_sentiment_daily_eth
22 infotrie_sentiment_daily_xrp

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

Subscribing to a Dataset

For this tutorial, I’m interested in devising a strategy based on marketcap data. There’s a couple duplicate entries in there, but the 10th item in there (coinmarketcap_historical_data) is a really great one, so we’ll use that!

In order to use this data, you have to pay to play and subscribe to the dataset. Subscriptions are carried out in monthly intervals, and have an ENG fee associated to it, which will be requested when attempting to subscribe to a dataset like this:

$ catalyst marketplace subscribe --dataset=coinmarketcap_historical_dataUsing <your metamask public address> for this transaction.The price for a monthly subscription to this dataset is 10 ENGChecking that the ENG balance in <your metamask public address> is greater than 10 ENG... OK.Please confirm that you agree to pay 10 ENG for a monthly subscription to the dataset "coinmarketcap_historical_data" starting today. [default: Y] Y

If your wallet doesn’t have enough ENG to subscribe to this, you will be notified, and please fund your account with enough ENG to continue!

If you do have enough ENG, and you are interested in continuing, type in a Y as displayed above to confirm your willingness to pay a 10 ENG monthly fee for this dataset.

Note: For those of you wondering, there is no sneaky auto renewal of subscriptions after the month has expired, and you’ll have to resubscribe again.

An instance of mycrypto’s legacy site that facilitates offline transactions is immediately launched. Your terminal will nicely display what needs to be done from here, so you are just a few copy and pastes away from finally using this data! There’s nothing I can really write here that the command line doesn’t already say concisely, but a few things I want to just highlight:

  • After you paste the From Address, which is your Metamask public address that you’ve already seen a bunch of times, be sure to click Generate Information. You will know this is successful if some of the fields on the page auto populate after this is done, such as the nonce (which is essentially a monotonic counter of the number of transactions sent by this account) for example.
  • Accept the default value for gas price, which is basically the ETH price per computation run by the EVM.

After you’ve populated the information, you need to unlock your wallet. Unfortunately these mycrypto offline transactions don’t allow for the Metamask option, and there’s nothing that can be done at the moment. While less ideal, it’s still very easy to proceed by using your private key (you can find this on Metamask by clicking the three dots > Export Private Key). Select the Private Key radio button option, and paste your private key into the text box.

Please be sure you’re only pasting your private key in that box alone and not as your Facebook status :)

Click Unlock and then Generate Transaction. You should see a long hex string auto generated in the Signed Transaction field — please copy this value in its entirety and paste it in the command line. Then press enter! After waiting 20–30 seconds, you’ll be notified that the transaction was successful, and another instance of MyCrypto will be launched.

Many of you might be wondering “whoa whoa, I just did this, why am I doing this again?!” Well, subscribing to datasets is a two part process. The one you just completed lets the marketplace spend ENG on your behalf. The next transaction you’re about to fill out and send carries out the actual subscription of the dataset. Once again, be sure to copy and paste the values to their corresponding fields on the MyCrypto page (and watch out for the same gotchas mentioned above). After another 20–30 seconds, you’ve now subscribed to the coinmarketcap_historical_data dataset!

I personally like to confirm this is the case for my own sanity, and you can do the same by attempting to subscribe again. You’ll be notified that you’ve already done that!

$ catalyst marketplace subscribe --dataset=coinmarketcap_historical_data
Using <your metamask public address> for this transaction.
You are already subscribed to the "coinmarketcap_historical_data" dataset.

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

Ingesting Data

Now that you’ve subscribed to the dataset, it’s time for you to pull in the data to your local machine to be used in your algorithms. You can do this with the following command:

catalyst marketplace ingest --dataset=coinmarketcap_historical_data

This will launch another instance of MyCrypto, but this time it looks a little different than the last few offline transactions you’ve sent. Basically what’s happening here is you will be signing a message to generate a key/secret pair (these will be added to the addresses.json file behind the scenes) to streamline future authentication requests. This is a one time process, and you shouldn’t ever have to do this again! As your terminal suggests, copy the entire line, which should look like this:

Catalyst nonce: <hex string>

On the MyCrypto page, I recommend you use Metamask to access your wallet (only for this signing step that happens just this once…for all other subscriptions/future transactions via the catalyst command line, please use your private key as described above). Paste the line you’ve copied into the Message text box:

and click Sign Message. This will launch a Metamask popup with a transaction you will click Sign on.

The MyCrypto page will have now auto generate a Signature json value that looks like this:

{   
"address": "<your metamask public address>",
"msg": "Catalyst nonce: <hex string>",
"sig": "<hex string>",
"version": "2"
}

Copy and paste the hex string value for the sig field without the quotation marks into your command line as requested.

The dataset will now be downloaded and processed, which should take under a minute, so please just sit tight and let it finish!

If for some reason it hung, and you ctrl-c’d out of it, it could cause your local data store to maintain a bad state. If ever you suspected this to be the case, I’d suggest you run the following command to clean it out:

catalyst marketplace clean --dataset coinmarketcap_historical_data

Then try ingesting once more!

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

Understanding the Data

There are lots of exciting data sets currently available, with many more to come. However, data is only as good as you make it; it’s very important to understand what the data looks like and any preprocessing needed to properly access/manipulate it from within an algorithm. I like to use Jupyter notebooks and pandas to analyze data. For those of you unfamiliar with Jupyter notebooks or Pandas, those are, in my opinion, the two single-most important Python-related tools imaginable. I highly recommend you familiarize yourself with them — the ROI is immense.

Jupyter notebooks are already made available in the catalyst conda environment you’re in and have activated. All you need to do is simply run the command jupyter notebook, and this will launch an instance in your browser. Click New > Python 3, to generate a fresh notebook. Again, I recommend you run through the Jupyter link above to get a sense of the various commands you can run!

I like to start off all my notebooks with a cell that auto reloads any changes to underlying code behind imports made in the notebook without having to restart the entire kernel from scratch:

%load_ext autoreload
%autoreload 2

Next, I import any necessary packages I’ll need throughout the notebook. Down the road, if I find that I missed anything, I’ll be sure to add it in this cell to make sure all my imports are nicely encapsulated in the same place:

import pandas as pd# Register the catalyst magic
%load_ext catalyst
from catalyst.marketplace.marketplace import Marketplace

Now that we’ve imported all the necessary packages, we can load up the dataset for initial analysis. We begin by instantiating a Marketplace object and pass the dataset name to get_dataset, which returns a data frame:

marketplace = Marketplace()
df = marketplace.get_dataset('coinmarketcap_historical_data')

Do a quick inspection of df as shown below:

You’ll notice a few things that will help you handle the data properly from inside the algorithm:

  • This is a MultiIndex data frame, with a unique (date, symbol) key
  • All entries (indices and values) are byte string representations, as indicated by the b'<value>' format
  • There are some nonnumeric values (b'-') in the market_cap column that need to be handled appropriately (removed from dataset), which I noticed after running this command:

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

Long-Only Marketcap-Weighted Monthly Rebalancing Strategy of 10 Crypto Assets (Poloniex)

For this example, we will be trading a long-only basket of the 10 leading crypto assets by marketcap, weighted by their respective capitalizations on Poloniex and rebalancing every 30 days. The parameters are customizable (how many crypto assets you want in your basket and how often you’d like to rebalance). Such a strategy gives you long exposure to the crypto markets — not just to BTC, but also to other strong-performing, proven coins. This is similar to the Bitwise Hold 10 index, without the management fee :) Feel free to copy and paste the block of code below into your PyCharm editor.

# For this example, we're going to write a simple long-only marketcap-weighted rebalancing algorithm. Upon
# initialization, we set our tradable universe of 10 assets to be the top 10 crypto assets by marketcap. Every 30 days
# from this point, we rebalance our long-only portfolio using target percentages based on the most recent marketcap
# weightings for this basket of cryptoassets.
import os
import tempfile
import time
import pandas as pd
from logbook import Logger
from catalyst import run_algorithm
from catalyst.api import symbol, record, order_target_percent, get_dataset
from catalyst.exchange.utils.stats_utils import set_print_settings, \
get_pretty_stats
# We give a name to the algorithm which Catalyst will use to persist its state.
# In this example, Catalyst will create the `.catalyst/data/live_algos`
# directory. If we stop and start the algorithm, Catalyst will resume its
# state using the files included in the folder.
from catalyst.utils.paths import ensure_directory
NAMESPACE = 'marketcap_weighting'
log = Logger(NAMESPACE)
# To run an algorithm in Catalyst, you need two functions: initialize and handle_data. You can add any additional
# helper functions to refactor code.
def rebalance(context, data):
# Marketcap df is indexed by timestamp in hourly increments, thus we need to find the most recent marketcap
# data by flooring the current time to nearest hour obtaining the marketcap data for this hour
df = context.enigma_marketcap_df.loc[context.datetime.floor('1D').strftime('%Y-%m-%d').encode()]
# Find valid tradable symbols on this exchange for this date
symbols = [a.symbol for a in context.exchange.assets if a.start_date < context.datetime]
assets = []
for currency, price in df['market_cap'].iteritems():
if len(assets) >= context.n_assets:
break
for quote_currency in context.quote_currencies:
s = '{}_{}'.format(currency.decode('utf-8'), quote_currency)
if s in symbols:
# Append this valid trading pair to assets list
assets.append(symbol(s))
break
asset_base_currencies = [asset.base_currency.encode() for asset in assets]
# Determine the assets that were previously in the portfolio, but not in the top 10 anymore and remove them
removed_assets = list(set(context.previous_assets) - set(assets))
for removed_asset in removed_assets:
order_target_percent(removed_asset, target=0)
record(f'{removed_asset.base_currency}_pct', 0)
log.info(f'Removing {removed_asset.base_currency} from portfolio')
# Determine each asset's respective weighting to construct rebalanced portfolio
market_caps = df.loc[asset_base_currencies, 'market_cap']
market_caps.drop_duplicates(inplace=True)
contribution_pct = market_caps/market_caps.sum()
for asset in assets:
alloc_pct = contribution_pct.loc[asset.base_currency.encode()]
# Set order target percentage to be the asset's marketcap-based weighting
order_target_percent(asset, target=alloc_pct)
record(f'{asset.base_currency}_pct', alloc_pct)
log.info(f'Ordering {asset.base_currency} at a marketcap-weighted portfolio percentage of {alloc_pct:.3f}')
context.previous_assets = assets
def initialize(context):
# This initialize function sets any data or variables that you'll use in
# your algorithm. For instance, you'll want to define the trading pair (or
# trading pairs) you want to backtest. You'll also want to define any
# parameters or values you're going to use.
# We create a marketcap-weighted long-only index comprised of 10 cryptoassets
context.n_assets = 10
# Obtain initial dataset to determine index holdings for remaining simulation
# context.enigma_marketcap_df = get_dataset('coinmarketcap historical data')
context.enigma_marketcap_df = get_dataset('coinmarketcap_historical_data')
# Data cleaning of marketcap data - remove nan and non-numeric values
data_clean_mask = (context.enigma_marketcap_df['market_cap'] != b'-')
context.enigma_marketcap_df = context.enigma_marketcap_df[data_clean_mask]
context.enigma_marketcap_df['market_cap'] = context.enigma_marketcap_df['market_cap'].astype(float).astype(int)
context.enigma_marketcap_df.sort_values(by=['market_cap'], ascending=False, inplace=True)
context.enigma_marketcap_df.reset_index(level=1, inplace=True)
# Lowercase symbols to make indexing into the dataframe and symbol generation easier from now on
context.enigma_marketcap_df['symbol'] = context.enigma_marketcap_df['symbol'].str.lower()
context.enigma_marketcap_df.set_index(['symbol'], append=True, inplace=True)
context.exchange = context.exchanges[next(iter(context.exchanges))]
# Set quote currencies to try for a usdt denomination, otherwise btc
context.quote_currencies = ['usdt', 'btc']
context.previous_assets = []
context.rebalance_period = 30
context.i = 0
def handle_data(context, data):
# Check if counter indicates a rebalance date (every 30 days)
if context.i == 0 or context.i % context.rebalance_period == 0:
log.info(f'Rebalancing on date ({context.datetime})')
rebalance(context, data)
record('rebalanced', context.i == 0 or context.i % context.rebalance_period == 0)
context.i += 1
def analyze(context=None, perf=None):
perf.to_hdf('./marketcap_weighted_perf.h5', 'df')
stats = get_pretty_stats(perf)
print('the algo stats:\n{}'.format(stats))
perf.loc[:, ['portfolio_value']].plot()
pass
if __name__ == '__main__':
# The execution mode: backtest or live
live = False
if live:
run_algorithm(
capital_base=1000,
initialize=initialize,
handle_data=handle_data,
analyze=analyze,
exchange_name='poloniex',
live=True,
algo_namespace=NAMESPACE,
quote_currency='usdt',
live_graph=False,
simulate_orders=False,
stats_output=None,
)
else:
folder = os.path.join(
tempfile.gettempdir(), 'catalyst', NAMESPACE
)
ensure_directory(folder)
timestr = time.strftime('%Y%m%d-%H%M%S')
out = os.path.join(folder, '{}.p'.format(timestr))
run_algorithm(
capital_base=1000,
data_frequency='daily',
initialize=initialize,
handle_data=handle_data,
analyze=analyze,
exchange_name='poloniex',
algo_namespace=NAMESPACE,
quote_currency='usdt',
start=pd.to_datetime('2017-01-01', utc=True),
end=pd.to_datetime('2018-06-14', utc=True),
)
log.info('saved perf stats: {}'.format(out))

Before you run this strategy, you need to have the daily price data for Poloniex stored on your local machine to stream through. This can be achieved by running catalyst ingest-exchange -x poloniex and waiting a little bit. Keep in mind, I don’t pass in specific symbols to this command since we are trading a basket of crypto assets, and thus I’d prefer having data for all the coins available on this exchange at my disposal.

The specifics of the strategy are commented in the code block above so please read through those for added context, but I’ll run through some of the high level concepts here:

  • initialize: We set some context variables here, including the number of assets, quote currencies list (we ideally want to trade USDT pairs, but when not possible, we will use a BTC pair instead), rebalance window, and the coinmarketcap dataset we subscribed/ingested earlier. As alluded to earlier when studying the data within Jupyter, we do some preprocessing here in this step (which only happens once), such as removing nonnumeric marketcap values, sorting the symbols within each day by marketcap (so we can easily retrieve the top 10 coins), and lowercasing all the symbols.
  • rebalance: We obtain a list of viable trading pairs on this exchange for this date, and loop through the top coins by marketcap until we’ve found 10 tradable pairs. Any coins that are not in the top 10 anymore are removed from our desired portfolio, and we construct a new portfolio by placing a buy order for a target percent (determined by their relative marketcap weightings in the basket).
  • handle_data: Using a context counter and the rebalance window, we call rebalance only when appropriate
  • analyze: We save off the perf data frame for analysis in a Jupyter notebook

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

Analyzing Results

Using Juypter notebook once again, I set up a notebook like this:

# CELL 1
%load_ext autoreload
%autoreload 2
# CELL 2
# Imports
import pandas as pd
import matplotlib.pyplot as plt
import plotly.offline as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly import tools
# CELL 3
# Register the catalyst magic
%load_ext catalyst
from catalyst.exchange.utils.stats_utils import extract_transactions
py.init_notebook_mode(connected=True)# CELL 4
# Load performance data into df
df = pd.read_hdf('../algorithms/data/marketcap_weighted_perf.h5', 'df')
# CELL 5
# Process columns to obtain cryptoasset symbols
asset_pcts = [col for col in df.columns if '_pct' in col]
asset_pcts = df[asset_pcts].iloc[0].sort_values(ascending=False).index.tolist()
assets = [asset.split('_')[0] for asset in asset_pcts]
# CELL 6
# Construct plotly charts
fig = tools.make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=('Portfolio Value', 'Index Weighting'))
# Portfolio value
portfolio_value = go.Scatter(x=df.index, y=df['portfolio_value'], name='portfolio value')
# Rebalance dates overlay
rebalanced_mask = (df['rebalanced'])
rebalanced = go.Scatter(x=df[rebalanced_mask].index, y=df[rebalanced_mask]['portfolio_value'], mode='markers',
marker={'symbol': 'star', 'color': 'red'}, name='rebalance dates')
fig.append_trace(portfolio_value, 1, 1)
fig.append_trace(rebalanced, 1, 1)
# Marketcap-weighted portfolio allocation percentages
cum_asset_pcts = df[asset_pcts].cumsum(axis=1)
for asset, asset_pct in zip(assets, asset_pcts):
text_labels = df[asset_pct].round(3)
trace = go.Scatter(x=cum_asset_pcts.index, y=cum_asset_pcts[asset_pct], text=text_labels,
hoverinfo='x+text', mode='lines', fill='tonexty', name=asset)
fig.append_trace(trace, 2, 1)
fig['layout'].update(height=600, width=1000, title='Marketcap Weighted Strategy')
py.iplot(fig)

As the chart demonstrates, this strategy is a very profitable one (remember, we started with a base capital of only $1000) and nicely captured the largely bullish market we’ve experienced with good exposure to some well-performing altcoins as well!

Not working perfectly? Click here to get help on the Catalyst forum | Click here to ask our team in Discord

Conclusion

Congratulations! You’ve now successfully worked through the process of subscribing and downloading some useful data on the Enigma Data Marketplace. Additionally, you’ve run a real-world, profitable strategy that leverages this data. Now go ahead and subscribe to other datasets you find interesting and see what other awesome strategies you can build!

Catalyst and the Data Marketplace are still in the alpha stage and under rapid development (a web UI for the marketplace is in active development), so if you have any questions, comments, concerns, feedback, or just want to say hey, please take a look at our full documentation and feel free to drop us a note in our Discord channel or the forums!

--

--

Aditya Palepu
Catalyst Crypto

Co-Founder & CEO @ DEX Labs. Duke Eng '13 (ECE/CS). Blockchain, ML/AI enthusiast. Previously DRW algorithmic trader. D.C. sports fanatic and burrito lover.