Why Cryptomarkets Need Trusted Token Reference Data

And how we can build the dataset together

Will Janensch
ConsenSys Media
8 min readNov 8, 2018

--

TruSet’s recent article, “Improving Web3 Market Research and Token Data,” reported on the state and scale of the token ecosystem and announced the launch of their Token Beta competitions.

Accurate reference data is the backbone of the financial services industry. With blockchain-based tokens emerging as new kinds of financial instruments and consumer utilities, trusted reference data around tokens will also serve as the foundation for our burgeoning cryptoeconomies. But first, what is reference data exactly?

At TruSet, we usually say reference data is all the information about a financial asset other than its traded price. While generally accurate, this description doesn’t help people fully understand why this data is important and how it is used. This article is intended to raise awareness of the importance of financial asset reference data for both traditional capital markets and the emerging cryptotoken ecosystem.

An Introduction to Reference Data

Types of data

Reference data is a subcategory of master data that refers to the set of identifiers, descriptive data, and metadata objects that are used across a company and/or across an industry to allow business systems to interact for purposes of transactions, sharing information and enabling automated processing. Microsoft defines master data as “the critical nouns of a business and fall generally into four groupings: people, things, places and concepts.” In general, reference data defines the set of permissible values, statuses, or classification models.

The enterprise database giant Oracle defined reference data as “data [that] carries contextual value and meaning and therefore its use can drive business logic that helps execute a business process, create a desired application behavior or provide meaningful segmentation to analyse transaction data.” Although reference data can have a very specific definition for data scientists, for the purposes of this article, reference data is defined broadly to include master data, reference data used by applications but not created by those applications, and Oracle’s definition of reference data above.

Industry-wide workflows

Reference data can be used both across internal departments within large enterprises and across counterparties, consumers, suppliers, and competitors in industry ecosystems. TruSet’s focus is on reference data needed to enable workflows between companies across an industry. Companies across most industries need to have software systems electronically interact for trading, commerce, and sharing needed information. Reference data is critical in enabling this electronic communication between companies, allowing systems to interact efficiently and effectively. Companies engaged in electronic communications with other companies need to be sure that the companies, assets, and products they are processing are the same in each relevant ledger, and that the terms and conditions, contractual details, descriptions, and other relevant information is clear and in agreement across all parties.

Key datasets

TruSet’s initial focus is on asset-level reference datasets — both for traditional financial securities, which are significant datasets for the financial services industry, and for the emerging world of cryptotokens, including native tokens such as bitcoin and ether, security tokens, and non-security “consumer tokens.”

Reference datasets are important to other industries that have similar pain points and that may also benefit from a blockchain solution for creating and maintaining more accurate and usable reference data sets — both within a large enterprise and across an industry. Additional reference datasets that may benefit from blockchain include medical records, insurance, and supply chain.

How Capital Markets Use Financial Asset Reference Data

Machine readable financial asset reference data is critical in nearly all investing and trading activity across capital markets. These use cases include:

  • Front Office: Research analysis, pre-trade analysis, asset valuations, and price targets
  • Middle Office: Trade routing, trade execution management, trade risk management, trade settlement, and post-trade analysis
  • Back Office: Portfolio risk analytics, portfolio accounting, compliance, reporting

Given the importance of these activities, it is clear that the data being used to drive these business decisions needs to be trusted and accurate. In addition, this information often needs to be processed in high volumes and/or extremely rapidly. This requires the data to be available in a machine-readable, and institutional-grade structured data format that can be accessed via APIs.

Given how foundational clean, accurate reference data is to the functioning of the capital markets, and how seemingly simple the data is, one might think that acquiring, maintaining, and using clean, accurate reference data is straightforward. In practice, reference data management is time-consuming and expensive. Even with best efforts to effectively clean and manage reference data, mistakes and inconsistencies also regularly arise, resulting in broken trades, misjudged risk and the need for restatements, all of which impose real-world costs on businesses.

Today, traditional financial asset reference data is sold to the capital markets industry by large data vendors. These reference data vendors collect free, unstructured security prospectuses, convert the prospectuses into their own proprietary structured data model through a combination of software systems and large teams of analysts, and sell the structured data as a data feed. Even though this critical information is generated by the capital markets industry, these vendors — Bloomberg, ICE, and Thomson Reuters being the largest — generate hundreds of millions of dollars in revenue by selling the industry’s own data back to itself.

These vendors generate hundreds of millions of dollars in revenue by selling the industry’s own data back to itself.

The process the vendors use to translate unstructured prospectuses into a structured data model is imperfect and results in errors. Prospectuses are not consistently written, so the interpretation of what data point needs to go into which field can easily be mistaken. Numbers get transposed; fields are left incomplete; legal terms are misrepresented. Despite the quality controls at the vendors, it is still the case that some reference data records delivered to customers contains errors.

Knowing errors inevitably exist in the data file, customers of this data, in addition to paying the vendor for the data, invest in back office processes (software and people, sometimes outsourced) to identify and correct errors. Once corrected, the customer can then use its “golden copy” of the reference data set to run its critical business processes.

Due to a lack of incentives and other structural issues, the customers do not typically send the corrections back to the vendor. As a result, each customer ends up finding and correcting the same errors in an inefficient and repeated cost function across the industry.

Since each customer executes their own data cleansing and integration efforts to come up with its own “golden record” of reference data, that cleansed record will frequently be slightly different from another customer’s “golden record.” This can result in further reconciliation in the future if those customers transact with each other.

One customer’s “golden record” will be different from another customer’s “golden record.”

This is an inefficient cost structure across the industry. Financial institutions would benefit from being able to mutualize this non-differentiating data cleansing and managing activity, thereby lowering their back-office costs and improving profitability.

TruSet is building just this kind of solution on the Ethereum blockchain. TruSet’s traditional asset reference data marketplace will enable participants to crowdsource the creation, error correction, and maintenance of fixed income reference data. As opposed to the costly and byzantine workflow with incumbent data vendors, the TruSet workflow frees data usage restrictions and creates an accurate and trusted machine-readable data source that will serve as a shared source of truth for powering the financial services industry.

Why Cryptomarkets Need Trusted Token Reference Data

Much like participants in traditional financial markets, institutional participants in cryptoeconomies need trusted data to properly understand how tokens work in dApps, make investment and trading decisions, and to manage token-based portfolios. However, while this need for trusted data parallels existing needs in the legacy financial markets, blockchain innovation creates an opportunity to build the data ecosystem for tokens from the ground up without replicating many of the inefficiencies and pain points that exist in the legacy financial world.

The good news is that a fair amount of critical data is already accessible and verifiable within the blockchain. However, some critical data is not captured in the blockchain and instead may exist (or not!) in a variety of unstructured, non-standard, and unregulated sources, including white papers, websites, blog posts, marketing materials, and other project or entity documentation.

In this way, the token ecosystem is replicating the problems that exist in the legacy financial world: once again, critical data is sitting in unstructured sources, and the community needs a trusted, machine-readable source of information.

The token ecosystem is replicating the problems that exist in the legacy financial world.

Power in Numbers: Bootstrapping a Token Data Marketplace

An opportunity exists: if we can build a secure data commons and align incentives in a way that mutualizes the data cleansing effort across the industry, then the entire industry can maintain a single, accurate set of data around tokens that the entire industry can use.

That is what TruSet is building. The TruSet token data marketplace will build the foundation for structured, machine-readable, trusted, and accurate reference data for all blockchain-based tokens and token projects. This shared data set will be a resource for the entire crypto community. Both trusted and community-validated, this data can be used not only to power investment and trading analytics but also to facilitate the back office processing of portfolios where tokens represent investment products. For consumer tokens, community-validated data can help educate consumers about the tokens they use in dApps. By establishing this core data as the foundational source of truth about tokens and projects, TruSet has the potential to serve as the basis for product innovation in token-powered markets.

Token Data Use Cases

In November, TruSet is launching a Beta program that will give the Web3 community a chance to generate a rich and machine-readable token dataset for all of the top tokens by market cap. The dataset will serve as an ecosystem-wide foundation for facts about tokens and token projects. We will be running staged competitions in which Beta participants will compete to earn TRU tokens and share prize pools of thousands of US dollars worth of ETH. If you are interested in collecting information about tokens, earning money for token research, and contributing to the data foundation for the Web3 ecosystem, we’d love to have you as a member of our Token Beta Community.

Work with the community, earn rewards, and create a trusted token dataset for Web3. Over $20,000 in prizes.

Visit truset.com to learn more about the Token Beta platform and follow the team at @TruSetData to stay updated on their Beta competitions.

Disclaimer: The views expressed by the author above do not necessarily represent the views of Consensys AG. ConsenSys is a decentralized community with ConsenSys Media being a platform for members to freely express their diverse ideas and perspectives. To learn more about ConsenSys and Ethereum, please visit our website.

--

--