In early January, a Ukranian plane crashed shortly after taking off from an airport in Tehran, within hours of increased tension between the US and Iranian militaries. That night and the several days following, I scoured the web like the little addict I am, looking for more information about who did it and why. Yet, all of the evidence I found was conflicting.
It was initially reported as a technical failure, there were images of alleged missile debris at a plane crash site and a video of a UFO hitting the plane. President Trump announced that Iran had shot down the plane, meanwhile some were talking about historic precedent, like that one time the US shot down an Iranian plane or when Russia used surface-to-air missiles against Ukraine.
I then realized that seeking any kind of truth was futile because there was no way to verify the accuracy of photos and videos or the legitimacy of primary sources, unless I actually obtained the video itself.
Our reality is that we’re data blind with no way to tell what’s fact and what’s theatre. The information we consume is littered with deliberate misinformation:
- False or exaggerated statistics, hoaxes
- Unconfirmed or biased sources
- Confirmation bias
- Sensationalist and/or misleading conclusions
- Disproportionate urgency and frequency [of information] relative to actual significance
We experience misinformation or fake news because of innate tendencies to self-serve and our perpetuating of shared information biases in social groups and algorithms. Some biases are acceptable and even helpful, but only insofar as that version of reality doesn’t deviate so significantly that believing it is harmful. The problem is, even when it does deviate, the public has no way of contesting or proving what data is accurate, canonical and well-sourced.
API providers gate information like trade volumes, social posts, and election results, despite our contributions to the datasets. Access to databases and valuable capabilities like predictive analytics are also limited to companies who are given privy rights to raw data in the first place — opinionated organizations who operate in self-interest and can’t be verified.
So what if we could rely on a global data source that was auditable for news, events, content, financial data and information sharing? One that was secure, public and collaborative so that any individual could rely on the source, contribute to the source and participate in the upkeep?
This is The Graph’s vision, a shared global API that’s accessible to all and curated by the community, so that no privileged group can have control over public data.
Know Your Data
Prior to the internet and blockchains, scientists verified sources of research and inventions by dating entries in private notebooks. Each page had a timestamp and was sewn into the notebook, making it clear when ideas were formed and when a page was tampered with. If an idea or patent was challenged by another, the notebook’s physical evidence would be used to protect intellectual property.¹
More recently, digital timestamping has been increasingly used to inform readers about relevancy of content to counter fake news. The Guardian noticed the same article was getting increased readership every February despite being written many February’s ago, and that readers were mistaking old news for current news. So they implemented obnoxious, in-your-face yellow badges with timestamps on all archived content to minimize confusion.
Photo metadata has also been used to debunk viral outrage, like countering inaccurate photos that the French President Tweeted about the Amazon Rainforest wildfires. Google Photos also captures metadata and indexes it to give us searchable photo albums. Data includes date and time of the photo, location and a catalog of faces. Twitter even lets users specify the time in a video to direct followers to specific segments, increasing the feedback loop for video-based news and content.
Yet even in 1991 in How to Time-Stamp a Digital Document it was pointed out that digital timestamps are unreliable and can be easily altered (eg. changing timestamp data in EXIFs) with minimal evidence to show. Metadata can also be accurate to the machine, but inaccurate relative to social standards that can lead to deceit, like a computer’s clock vs. global clocks.
Blockchains provide solutions to our issues with verifying data. Since data is public, it’s always attributable — sources can be checked and their activity interlinked at any time. Cryptographic proofs provide security and certainty about provenance. Moreover, a large set of validators agree upon a canonical clock so timestamps can’t be faked. Providers are also limited from monopolizing data access and rent-seeking; since on-chain markets enable price discovery and allow anyone to contribute to protocols or decentralized applications (dApps).
Whereas Bitcoin stores data about balances and transactions, smart contract platforms like Ethereum can also emit events, store proofs of off-chain activity and on-chain programmable exchange. This includes dApp use cases like trading, messaging, job completion, blog posting, music, photos, videos, voting, and more.
Suppose we have a marketplace called dEbay that’s built on top of BuyThings Protocol where trades and exchanges are on-chain. We can query the blockchain for information like the seller’s reputation, listings and products sold, the number of buyers, all bids and the price at which the market settled.
Public data, however, does not mean lack of privacy. Just like with usernames, individuals will use aliases or legal names and encrypt private data, depending on the benefits of publicity.
Private data encrypted, public entries cited.²
We can thus foresee a future where every public action is traceable, every asset is tokenized, every agreement is programmed. Most applications will be built on distributed ledger technology to ensure security and accountability over data. For every interaction with a mobile or web app — a login, a click, a view, a payment — a proof of activity will be required and apps will only consume auditable data.
But consuming blockchain data is hard.
The Graph is an indexing protocol for querying data, to facilitate a collaborative economy for sharing datasets, and to let dApps more easily consume blockchain data. CoinGecko, Uniswap, Synthetix and many other dApps query subgraphs today on The Graph’s Hosted Service.
The Graph is core infrastructure for accessing decentralized APIs, making UIs more performant and engineering teams more productive. REST APIs, centralized servers, databases, and caches are replaced with subgraphs and indexing nodes that can do the work more efficiently and securely.
Subgraphs define GraphQL APIs from which a front-end application fetches data that it’s interested in. Once a subgraph is deployed, a Graph Node ingests and transforms the source data, such as trade volumes, derivative prices, fund balances, user votes and anything else a developer may want to query. Today subgraphs can be deployed for Ethereum and EVM-based chains and multi-blockchain support will be added later this year.
When individuals perform actions in decentralized applications, they emit events. These events can be filtered, aggregated and transformed by subgraphs that define how the event data should be processed.
Since subgraphs are open source, developers share datasets allowing everyone to benefit from each other’s work of organizing that data. This web of subgraphs creates a shared source of truth that’s governed by The Graph Network.
The network will be maintained by Indexers who process queries and stake on their work, and Curators who signal and stake on which subgraphs are high-quality. Curators will provide information to Indexers and developers about which data sources are useful and dishonest Indexers will be slashed.
Indexers and Curators will earn fees for providing services to the query market, comparable to fees for API calls. With competition for indexing, it will be difficult for nodes to be extractive and it’ll be clear which sources are reputable, which applications rely on them and who’s contributing to their maintenance. The larger the query market, the more efficient it’ll be and higher quality data will be rendered.
In addition to Indexers and Curators who will signal subgraph quality, there could be other important contributors to the data economy.
Moderators — A market or community for watching front-ends might emerge, similar to a neighborhood watch that ensures the subgraphs that dApps are pointing to are indeed appropriate for the cause. While a subgraph may very well be high-quality and accurate, apps could use subgraphs that are not relevant and thus mislead users (ie. dEbay showing old bids instead of new bids). This is similar to Reddit moderators who curate forum content.
Aggregators — New kinds of aggregators are likely to appear as the existing consolidation of Web2 aggregators will become fragmented when all data is public and it’s difficult to rent-seek. Currently financial data is controlled almost exclusively by Reuters, S&P and Bloomberg who go to extreme measures to gate access (eg. Bloomberg charges $2K/month per terminal license and requires fingerprints or fobs for every login).
The Graph can facilitate new uses for data aggregation that were never possible because data was restricted, making it more difficult for data aggregators to rent seek.
Let’s say we build an aggregator called dEtsy that curates event data from only the top independent stores, designers and artists. This app could rely on subgraphs for each retailer’s on-chain events as well as the subgraphs of all providers throughout the supply chain, to derive insights like unit costs, sales volumes, SKUs and analytics. Subgraphs will also be composable, so aggregators can be layered. For example, we could see a DeFi subgraph that’s made up of subgraphs for DEXs, staking, wallets and lending apps.
Embedded Truths and Exposure Effects
While The Graph will provide a viable solution for countering fake news by exposing auditable data publicly so it can’t be misconstrued, it won’t solve the oracle problem — verifying if on-chain commitments are truthful in the first place. Oracle data can be indexed and Curators can signal on those subgraphs, but ensuring valid data will be left up to oracle providers and dApps.
Human desire to drive conclusions in our favor also isn’t going to necessarily change, even if we verify all sources. Politicians promote propaganda, companies withhold data, journalists cherry pick facts.
Despite warnings of inconvenient untruths, we often still trust our biases. Many studies have tested the effectiveness of transparency efforts in deterring fake news, such as content tags like “disputed”. One study found that while general warnings create a sense of overall reporting inadequacy, but specific tags calling it misinformation could encourage users to challenge other facts and tag inaccuracies too. This is another benefit of The Graph’s sharing economy over siloed indexers: positive reinforcement for upkeeping will lead to unanimous verification.
A Dartmouth study also assessed the impact of disclaimers on perceived accuracy. Viewers were shown either a) just headlines, b) headlines and “disputed” tags or c) headlines and explicit “false” tags. The study found that while users perceived content with “disputed” and “false” tags to be less accurate than just headlines alone, our biases and veracity to prove our existing beliefs still play a large role (re: b) the results for Trump approvers seeing Pro-Trump headlines with “false” flags).³
Despite specific tagging being more useful than general warnings, neither can fully omit the impact that exposure has on our beliefs.⁴ We are still susceptible to deliberate misinformation because our desire to believe outweighs the significance we attribute to actual evidence.
To The Graph and Beyond
Everyone is entitled to their own opinions but not to their own facts. Open source subgraphs and a public Graph Network will mean that all indexed data (and how it was indexed) will be traceable and agreed upon. Your FaceJournal CNN debate will be enriched with timestamps and proofs of on-chain and off-chain commitments, like photo metadata and signatures from reputable journalists.
The combination of a large network of validators and access to a large set of verified data sources will make The Graph’s whole bigger than its parts. While apps could resort to maintaining their own indexing service, they miss out on the multiplicative value of having their subgraphs utilized by other apps and verified by other individuals. Developers using eachother’s subgraphs enables data composability like we’ve never seen.
Far gone will be the days of fighting for open banking and PSD2 when the standard will be open and public data on blockchains, indexed by The Graph. Although it won’t be able to eliminate our natural biases, our world view will be much clearer and we’ll be incentivized to keep it that way as we can all contribute to upholding the Graph Network. To learn more check out The Graph Network In-Depth.
To better understand The Graph architecture, how to deploy a subgraph or run a node, take a look at the docs and Graph Explorer. You can also join the community by participating in a virtual hackathon or joining the Discord to chat with The Graph engineers.
Thanks to Tegan Kline, Yaniv Tal and Brandon Ramirez for feedback and edits.
1. Haber, Stuart, and W.scott Stornetta. “How to Time-Stamp a Digital Document.” Journal of Cryptology, vol. 3, no. 2, 1991, doi:10.1007/bf00196791. https://link.springer.com/content/pdf/10.1007%2F3-540-38424-3_32.pdf
2. @balajis. “Many news articles are now wrappers around tweets. This is actually a form of progress. Because eventually, it’s all event feeds. Many events are now digitally logged to separate DBs. But eventually this is one giant feed. Private data encrypted, public data cited. ” Twitter, 30 Dec. 2019, 2:35 p.m., https://twitter.com/balajis/status/1211777785460666368.
3. Clayton, K., Blair, S., Busam, J.A. et al. Real Solutions for Fake News? Measuring the Effectiveness of General Warnings and Fact-Check Tags in Reducing Belief in False Stories on Social Media. Polit Behav (2019) doi:10.1007/s11109–019–09533–0 https://www.dartmouth.edu/~nyhan/fake-news-solutions.pdf
4. Ecker, Ullrich K.H., Stephan Lewandowsky, and David T.W. Tang. 2010. “Explicit warnings reduce but do not eliminate the continued influence of misinformation.” Memory & cognition 38 (8): 1087–1100. https://link.springer.com/article/10.3758/MC.38.8.1087