Smart Data for Smart Contracts
ONCE UPON A TIME, all our data was on paper. Thirty years ago, we started digitizing our data, and now that task is almost complete. The problem is that we have effectively managed to recreate our paper documents on our glowing screens, so we carry the legacy formats and work-flows with us. This short video tells the story:
Now we’re beginning a new chapter, where we use decentralized data and smart contracts to deliver value from our data. Yet, once again, we are imitating the old way of using data. We may have smart contracts, but we’re using dumb data. To get out of this trap, I will present six principles of “smart” data:
- Universal Name Space
- Least Privilege
Whether it’s on a central server, on the blockchain, or in a decentralized system, we should be using “smart data” with our smart contracts. An example should help.
I keep hearing about “solving the identity problem” on the blockchain: KYC, AML, CYA, etc. Most of these ideas boil down to storing a bunch of images of paper documents somewhere and appending a hash of these files to the chain, so you have time-stamped evidence that those documents referenced you at that time. Let’s look at the six principles in this context …
If we had truly interoperable data, then your identity data would be available to various systems in a format they understand. So if a system wants to know your address or height or credit score, you can give it with one click. This is a problem people working in the semantic web have been working on for decades. We have made a lot of progress. We have standard formats, ontologies, vocabularies, abstractions, and name spaces. I don’t see enough emphasis on interoperability when people talk about decentralized data.
You may have created your own identity on the blockchain, but you are welcome to create as many as you like. This won’t work for voting and registration systems. This is also true for a lot of data and oracles we’ll be using: we want to establish authoritative sources for data and smart contracts, so users know there is one place to look, rather than many.
- Universal Name Space
We could use a single worldwide identifier for people that goes beyond the country you were born in, and it’s coming. We must disambiguate people to help them take advantage of the digital world.
Identity information is not immutable — it changes. Just as a wallet can “watch” a smart contract, all our systems should be able to detect changes to the data they receive. If I change my address, the new address should replace my old address on every contract I’ve entered into. If my optical prescription changes, that should update my digital driver’s license in real time, possibly giving me a day or two to get new glasses before my car won’t let me drive it (okay, that’s a silly example — cars will have their own vision and drive better than I can). If I am visiting Japan for a few months, that should ripple through my digital ecosystem and make changes accordingly. If a shipment of parts doesn’t arrive, that should automatically update the supply chain. By setting up authoritative sources for our data, we empower systems to work for us in real time, automagically, without costly checking, versioning, and reconciliation.
- The Principle of Least Privilege
According to this principle, which goes by other names, we only want to give out the minimum amount of information required for a transaction. So, for example, if you’re buying a bottle of wine from a distributed app, the app can ask “are you old enough to buy this item in <whatever jurisdiction applies>?,” and a reputable third party will answer back “yes” or “no,” without giving your age or identity. It’s even more important when you’re lying unconscious and the medical team needs data from your electronic medical record. If you buy something online, only the postman needs to know your physical address — the sender doesn’t. Checking into a hotel? No need for a registration desk, just let your phone guide you to your room. Your smart contract has just enough information to allow you to enter the room using your smart phone. The hotel has no idea who is staying there. Your details could be provided later, if necessary.
When it comes to personal data, the principles above are most relevant. But think about all your data: from miles driven to all the money you spend to all your health tests, to the classes you take, to everything you do in your career, etc. You may even want to record and save every word of every conversation you have with others. In the coming years, we will want more data, not less, and that means breaking things down into small units, as I’ll show in the next example.
Just as “the trade is the settlement” and “code is law,” so we should say “there are no copies” of any data anywhere. In the bright future we all envision, the goal is to eliminate extra effort and replace routine work with efficient systems. Let’s look at another example: music.
On your smart phone right now, there may well be a library of songs. For absolutely medieval reasons, the actual songs are sitting in memory on your phone, waiting to be played. They are copies of the original files sitting on some cloud service. Even those files are copies of files distributed by labels, and they get those copies from the artists.
We can do much better than that. By putting a song’s metadata on the blockchain, we can manage both the music and the rights. Here’s how:
Stream from a Single Source
The artist puts a single file online, and what happens next is important: it’s copied by name (not location) to several servers or computers around the world, for instant streaming access via a decentralized content-distribution network. But those aren’t permanent copies. Each one refers to the original file uploaded by the artist. The user then pulls each song by name to his/her device for listening. A song may linger on that device for some minutes or hours, but it isn’t stored on that device. The only thing stored are names of songs and rights the owner has to listen to those songs.
In this case, the artist has sold his/her rights to consumers to listen to that song using tokens and probably a smart contract recorded on the blockchain. Money has changed hands — that sets up rights and obligations. The consumer may, for example, have the right to access that song for five minutes or five years, depending on the terms of the contract. So the artist doesn’t have the right to modify that particular file. If the artist wants to create another version of that song, it would be another file.
As I have described in my book, Pull, this list of songs and permissions shouldn’t reside on our phones. It should reside online, so we can transition from expensive smart phones with limited memory to cheap dumb phones with unlimited storage online. Soon, I hope, our phones will cost almost nothing, and everything that makes your phone yours will be online. The transition to dumb phones will let billions of people leapfrog into the 21st century with state-of-the-art technology for pennies.
There may, in fact, be competing sources for the same kind of information. That’s fine. But we need the metadata to authenticate these sources and let people (and smart contracts) choose which sources they want. We can even add uncertainty (error bars) around our data if we feel that will give us better results, and we can build systems of uncertainty that help us deal with real-world issues rather than boiling them down to a single number.
Photography as a Service
I wrote in my 2010 book a prediction on photography that hasn’t come true yet, but it may if we get these principles right. The idea is that there are good photographers all over, and most of us don’t want to focus on our photography. So when you’re on a trip, you could just set your phone to broadcast your willingness to be photographed, and local photographers will take your picture, then put them online for you to see. Later, if you like the photos, you pay for them.
This is an interesting use case for smart contracts. There’s no reason to broadcast your name or home address. You want to be identified anonymously, photographed, and then pay for any photos you like. No one ever needs to know anything more than your temporary number — the smart contract can keep you anonymous the entire time. Here’s a nice talk by David Birch explaining why anonymity is so important.
In my essay on insurance, I explained that we should chop our digital assets into the smallest meaningful pieces and then manage portfolios of such pieces. One way to do that in music is to tokenize our data — create a cryptographic token for every second of every song. That’s right — I’m talking about billions of tokens on the blockchain that represent millions of songs. Each token carries a meaningful data payload and its own smart contract (or subcontract) that specifies under what conditions it can be purchased, used, incorporated, referenced, etc., and how its creator is to be reimbursed. So if someone cuts snippets out of twenty songs to create a new work, and that new work becomes popular and makes money, every time that new work is streamed, the appropriate royalties also stream to the appropriate creators of each second of that new work. In today’s world, that’s impossible. In the world of smart data and smart contracts, I hope it will become mainstream.
I can imagine that once we have such a system, Brian Eno will snip exactly one second of silence from each of 183 different songs, creating a “song” that consists of three minutes and three seconds of complete silence. Then, when consumers buy that “song” and people “listen” to it, all 183 creators will get their part of that royalty stream automatically from the smart contracts embedded in the tokens.
This is a different world. Now, you want people to sample your work and incorporate it into theirs — the reverse of what we have today. It’s especially valuable in the DJ, sampling, and remix culture that makes up much of the digital music ecosystem today.
You want different granularity for each application. For tickets, one token per ticket is probably optimal. For tokens that represent equity in a company, you should go to at least four decimal places, because that enables trades in many different currencies without having to cut coins in half. You might want to sell your time in 1-hour increments, but you could also create 15-minute increments in case you need them later. In general, the rule is to add finer granularity than you think you might need, to account for future use cases that can use it.
This concept, call by name, means we don’t specify where anything is, we use a unique name to refer to it; calling a digital object will retrieve it from wherever it is. Today on the web, we specify files by location, not by name. That’s why we so often see 404-not-found pages, because we’ve gone to a location where the file used to be but is no longer. We need something like a DNS system for data, so it doesn’t matter where the data is — we can retrieve using its permanent name rather than its location.
Name spaces can get complex. We know very well that simple taxonomies don’t properly describe them. For example, take all the employee names and permission cards for getting into buildings for a company like IBM — hundreds of thousands of employees and tens of thousands of locations. Now the company has to track a ton of information: documents, sub-documents, legal issues, numbering systems, misspellings, ambiguities, deletions, duplicates, subdivisions, archives, ex-employees, potential threats, and much more. Think of all the parts NASA has to keep track of, or GM. Today, finding data files on separate systems is a multi-billion-dollar business. What if something is deleted but they later need information about it anyway? What about duplicates? How does this fit with existing and future taxonomies and ontologies? How will we decentrally store a table with 100 million entries for fast search and retrieval? Can an object have several names? What shall we do about alternative naming systems? Where are the standards?
I understand there are several competing systems for decentralized storage, publishing, archiving, etc. There’s IPFS, MaidSafe, FileCoin, Swarm, Open Mustard Seed, and others. I know that oracles for smart contracts are a hot topic at the moment. But I hope the single name space we adopt goes far beyond Ethereum, and far beyond decentralized systems! You should be able to run a standard web site or blog using decentralized storage and http. You should be able to run a smart contract by fetching data by name — whether it comes from a centralized server or from another smart contract or from a device. All of this will be seamless courtesy of an enterprise-grade, world-scale digital-object naming system.
Example: I want to have my digital identity online and manage all my public keys there. Then, when you want to send me money, you would simply send it to me at: firstname.lastname@example.org (my email address will do nicely for this), and whatever you want to send — whether it’s Bitcoin, Ether, dollars, euros, or other currency — will go to the right address because the software DNS system for names will provide it automatically. You won’t have to know any of my public keys at all. This makes a better address book than having to keep track of people’s different addresses. The same will happen with phone numbers — you should just be able to call me on my email address and my system will figure out where to reach me.
A single set of digital naming standards is critical to the development of the decentralized web, the Internet of Things, passports, voting systems, supply chain, and other use cases. The standards will have to be governed by their own non-profits, and there are several existing nonprofits already in the name-space business for various industries. I see progress being made, but we need even more effort in this direction. It may be time for a few days of face-to-face meetings on this topic.
Building Better Rails
A rail is a token that works across different systems. A payment rail can transfer value seamlessly from payer to receiver without translation. A refugee rail would help facilitate people’s movement from the time they apply for asylum until after they are settled in their new home. A mortgage rail would carry data from the application through approval to the working digital mortgage contract that can adjust itself based on market and other inputs.
To make rails work, we need live, up-to-date data. This will help us create ecosystems, where one data module can play different roles in different systems. Without this, we will be translating and copying, and that goes against the general idea of shared ledgers.
In my book, Pull, I have an entire chapter dedicated to book metadata using a format called ONIX. Unfortunately, book metadata is a one-way street: it gets copied every step of the way, from publisher to distributor to retailer, so when they fix a mistake at Amazon, the original mistake stays in the publisher’s catalog. They have no way to get the information back upstream, because everyone is copying and modifying the metadata to suit his own purposes. Copying breaks the rail principle and costs more money than you might think.
Think about designing and building an airplane. Using the principles of smart data, the iterative design activity takes place in a single virtual space, where many vendors can play along and add their designs as the new plane takes shape from one iteration to the next. There are no documents — shared data drives everything. Sharing ledgers and data helps everyone stay in sync without having to keep track of and confirm the latest changes. There are no copies. Once the design is ready, the very same system used to design a plane can be used to produce each one to order. All the data involved plays different roles at different times.
No documents. No copies. Think how different that world is from the world we’re in today.
The Smart Data Test
I won’t go through more examples — I’ve written an entire book on smart data. These important principles should guide the design of our future systems. If the promise of the blockchain future is to revolutionize the way we use information, then we’ll have to ask ourselves the following questions:
- Is there only one source of this data?
- Are the data elements as meaningful to most systems that will use them?
- Do we call them by name, using a single universal data naming/locating standard?
- If some piece of data changes, do most systems subscribe to it and get the changes as needed?
- Is there a third party we trust to give us answers to questions based on private or sensitive data?
- Can we slice our data or assets into finer modules, so people can use them the way they want, rather than the way we intend?
If we do it right, in ten years, it will look like this:
It seems like we have a long way to go, and we do. But I have good news: the blockchain revolution will, I believe, kill many existing companies and data traps, allowing a new generation of companies to thrive. An example would be SAP — I don’t see how they will survive the blockchain revolution. As I have written, entirely new data ecosystems will probably replace dinosaur companies like SAP, Oracle, ATT, LinkedIn, AutoDesk, and many others. The new companies that take their place will use smart contracts and smart data principles to bring us into a completely new way of using information. And that will bring us the future we deserve.
David Siegel is a blockchain consultant, entrepreneur, lecturer, proprietor of DecentralStation.com, and runs a blockchain bootcamp for organizations. You can learn more at www.kryptodesign.com and connect with him on LinkedIn.