First, although I work for Chia Network Inc, the ideas and opinions presented on this Medium account are entirely mine and do not necessarily reflect those of my employer.
The below is my personal description of Chia DataLayer and is not to be considered official documentation. It has not been reviewed by the Chia documentation team nor the engineering team and will not be maintained. Any errors or inaccuracies are entirely my fault. If it is confusing or unclear, that is also my fault. There is nothing here that is not public information available elsewhere, only my best effort to make it easier to understand.
I’m writing this because I strongly believe this will become a foundational protocol for working with data that will change how we think about data ownership, durability and authenticity. I’d like to help get the ball rolling with Chia DataLayer by helping more people to understand what it is, how it works, and some initial thoughts on what you might do with it.
Introduction
The world has many ways of dealing with data. What makes Chia DataLayer different?
- You Own Your Data. You have to use your cryptographic key to change your data, and without that key, no one can change your data.
- Censorship Resistance. You can use nearly any channel to make your data available to others and still ensure they data they receive is intact. Censorship in one channel or one jurisdiction does not affect your ability to be heard.
- Peer-to-peer. There is no central server in Chia DataLayer, and everyone is free to use it how they wish.
- Auditability. Every update you make to your data is recorded on the immutable Chia blockchain, at a specific block height. Data cannot be retroactively altered.
- Accountability. Anyone that has a copy of the data published to DataLayer can prove that the data is correct and will know the key used to record that data. The holder of that key can be held accountable for the data they publish.
- Durability. Anyone can participate in ensuring the durability of data stored on Chia DataLayer, regardless of the who published that data.
- Privacy. While the blockchain is used to prove that any particular copy of the data in Chia DataLayer is correct, the data is actually transferred between users entirely off-chain. You can decide whether to make your data publicly available, or to limit access, based on how you choose share the data files. You may not share the data at all, and simply use the blockchain to assure auditability.
- Smart Contracts. Data stored in Chia DataLayer is available for use in Chialisp smart coins. I believe this feature by itself will have a huge impact on how smart contracts are written, and will be a tremendous differentiator compared to other smart contract languages and platforms.
This article provides:
- a completely non-technical mental model for how DataLayer works
- a mapping of that mental model to the actual implementation on Chia
- killer feature: proofs of inclusion
- two party commits: proofs of inclusion in offer files
- a tour of the RPCs available to actually use DataLayer
Hopefully at the end of this you’ll have a solid, working foundation for building decentralized applications with Chia DataLayer.
Mental Model in a Metaphor
I think the most important thing is to offer a solid mental model for how DataLayer works. To that end, I would like to offer a simple abstract analogy that I hope will help to build an intuitive understanding of Chia DataLayer. Immediately after, I will re-do the scenario in a very concrete way, describing exactly how DataLayer works at each step. Feel free to skip this section if abstract analogy is not your thing. The subsequent section will still make sense without it.
In our scenario, Alice, Bob and Carol are sharing pictures of their pets in a group chat.
Alice writes:
Alice (10:30pm): My picture will be called AlicePet
Alice (10:30pm): I will make AlicePet available at http://alice.com/
Alice (10:31pm): AlicePet version 1 is a picture of a cat
Bob sees this and goes to http://alice.com/AlicePet_1.jpg
to fetch the picture. When he gets it, he verifies it is indeed a picture of a cat, and so is confident he has the correct picture.
Now it is Bob’s turn to share:
Bob (10:45pm): My picture will be called BobPet
Bob (10:46pm): I will make BobPet available at http://bob.com/
Bob (10:47pm): BobPet version 1 is a picture of a dog
Alice sees what Bob has posted and goes to http://bob.com/BobPet_1.jpg
to fetch the picture. When she gets it, she verifies that it is a picture of a dog, and so is confident she has the correct picture.
Alice thinks Bob’s picture is great and wants to help make it available to anyone that wants it. So she writes:
Alice (11:00pm): I will make BobPet available at http://alice.com/
Carol joins the group chat and sees all prior messages. She decides she wants to see Bob’s pet photo. She notices there are two locations available: http://alice.com
and http://bob.com
.
Carol picks one and goes to http://alice.com/BobPet_1.jpg
. Alice must have made a mistake, because the picture she received was of a fish! Carol knows to discard the photo because Bob had said it would be a dog.
Carol then goes to http://bob.com/BobPet_1.jpg
to fetch the picture. She confirms it is a dog and is satisfied she has the correct photo.
Bob now decides to update his pet photo.
Bob (11:15pm): BobPet version 2 is a picture of a marmot.
Alice fetches the photo from http://bob.com/BobPet_2.jpg
and confirms the picture is of a marmot.
Carol has two choices of where to get the photo. She happens to select to get it from Alice again, who now already has the updated picture. Carol goes to http://alice.com/BobPet_2.jpg
to fetch the photo and confirms it is a picture of a marmot, so she knows she has the correct photo.
Alice decides to stop sharing her pet photo:
Alice (11:30pm): AlicePet is no longer available at http://alice.com
Carol now decides she wants to see Alice’s photo, but there is nowhere for her to get it. Even though she is not able to get Alice’s photo, she still knows it should be of a cat, and that the cat photo was announced at 10:31pm.
Offline, Carol talks to Bob and he offers to provide her the photo. She opens the photo provided by Bob and confirms it is a cat, so Carol knows Bob has provided the photo that Alice had originally posted.
Making the Metaphor Real
So that was pretty silly, but hopefully it was illustrative of the protocols that DataLayer uses to communicate. Mapping the above to real Chia blockchain concepts:
- The “group chat” is the Chia blockchain. Everyone can observe it, and can contribute only new “messages” in the form of coin spends.
- AlicePet and BobPet are DataLayer singleton coins. A singleton coin is a coin that has a long-lived identifier assigned by the blockchain called the launcher id. As a singleton, only one unspent coin with this launcher id can ever exist at any time. As a DataLayer singleton, it has additional functionality for maintaining the current hash of the table of data represented on chain by that singleton. If this paragraph doesn’t make sense, please see Coin Set Intro and Singletons.
- The “pictures” are the actual tables of data. Each table can have any number of rows, and each row in the table consists of a key and a value where the key and the value can each be any blob of binary data. The only restriction is that there cannot be duplicate keys in the key-value store. The correct term for what I call a “table” is “data store”. You’ll see that name used in the documentation, but I have trouble getting used to that term.
- The notion of “dog”, “cat”, “fish” and “marmot” in the story was the hash of the data in the data table. Specifically it is the hash of a Merkle tree version of the data, where each row is a leaf in the tree, but that is a detail for later.
- The announcements of http URLs are done with announcement coins. These are simple XCH coins with a trivial wrapper puzzle that mostly just tags them as announcement coins as distinct from regular XCH. The coin hint is the launcher id of the DataLayer coin to which the announcement applies, and the memo is the URL, recorded in cleartext directly on the blockchain.
Now we can rewrite the story using the correct Chia terms:
Alice creates her singleton using the DataLayer RPC create_data_store
:
Block 100:
Spend XCH coin id: 0x123456 (value: 1 mojo)
Create DL singleton coin (value: 1, launcher ID: 0x123456, hash: 0x0000)
(For purposes of discussion, I’m using shortened placeholders for ids and hashes such as 0x123456
and 0x0000
. In reality, these are really long values.)
Next Alice announces the URL where she will host the data using the add_mirror
RPC:
Block 101:
Spend XCH coin id: 0xabcdef (value: 10 mojos)
Create announcement coin (value: 10, hint: 0x123456, memo: "http://alice.com")
And Alice updates the data in her store with batch_update
(update_data_store
in CLI):
Block 102:
Spend DL singleton coin (launcher ID: 0x123456, hash: 0x0000)
Create DL singleton coin (launcher ID: 0x123456, hash: 0x3845)
For reference, the DataLayer RPCs are documented here.
Note that we can’t tell how many changes she made to the table in her update to the data just by looking at the blockchain. She could have added one row or 1000. All that is relevant on chain is that the hash of the data has changed.
Also note that the actual data is not stored on the blockchain, and does not get copied to all of the many validators globally. The data is transferred from user to user off-chain, and the blockchain is only used to validate that a particular copy of the data is correct.
Alice tells Bob her DataLayer singleton launcher ID is 0x123456
, and he subscribes to it in his wallet with the subscribe
RPC. On subscription, Chia DataLayer does several things: first, it looks up the singleton with that launcher ID and finds the current hash to be 0x3845
. It then looks for any announcement coins with Alice’s DataLayer launcher ID as the hint, and finds one. It gets the URL http://alice.com
from the announcement coin. Finally, it constructs a http request:
http://alice.com/123456-3845-full-1-v1.0.dat
This file is:
- Located at
http://alice.com
per the announcement coin - For the DataLayer singleton with launcher ID
123456
(“AlicePet”) - With the hash
3845
(“cat”) - Since Bob does not yet have any data for this singleton, we are requesting the
full
file, with all records, instead of thedelta
file with only the records changed since the last update. - This is update
1
of this DataLayer singleton - The file format version used in this file is
v1.0
Upon successfully receiving the file, and before making the data available to Bob, Chia will recalculate the hash of the data received and verify it matches the hash on chain. Assuming it matches, the data will be available to Bob using get_keys_values
, get_keys
or get_value
.
It is worth stopping here for a moment to recognize what has happened. Alice published her own data signed with her own private key. That data is provably owned by Alice. And Bob can be absolutely sure that he has the exact data that Alice published and he knows exactly when she published it, despite receiving the data over an insecure channel (http
) and receiving it an unknown amount of time after she published it. Alice and Bob did this entirely as peers, with no intermediary and no need for trust. That’s a big deal and the implications will be huge.
Bob does the same to publish his data using DataLayer:
Block 110:
Spent XCH coin: 0x745698 (value: 1 mojo)
Create DL singleton coin (value: 1, launcher ID: 0x745698, hash: 0x0000)
Spent XCH coin: 0xabcdef (value: 10 mojos)
Create announcement coin (value: 10, hint: 0x745698, memo: "http://bob.com")
Block 111:
Spent DL singleton coin (launcher ID: 0x745698, hash: 0x0000)
Create DL singleton coin (launcher ID: 0x745698, hash: 0xAB12)
Notice in this case, the creation of the singleton and the announcement of the URL happen in the same block. That’s fine.
Bob now passes his DataLayer singleton launcher id (0x745698
) to Alice. She uses subscribe
to subscribe to Bob’s DataLayer singleton, and receives and validates Bob’s data in the same way. Upon success, the data is stored in a sqlite3 database in .chia/mainnet/data_layer/db/
. All of the data for every DataLayer singleton that Alice publishes or subscribes to is stored in this database. Additionally, the files for Bob’s data are stored locally in .chia/mainnet/data_layer/db/server_files_location_mainnet/
, along with the files for Alice’s data.
Next in our story, Alice decides to provide a mirror of Bob’s data. She has it, and she can be confident that it is correct. She uses add_mirror
to make it available to others.
Block 120:
Spent XCH coin: 0x927403 (value: 10 mojos)
Create announcement coin (value: 10, hint: 0x745698, memo: "http://alice.com")
There are lots of reasons to mirror data you don’t own, among them:
- To help ensure the data persists. If Bob’s machine gets wiped out, Everyone (including Bob) can still get the data from Alice and verify it is correct by checking the hash against the blockchain.
- To help ensure the data is available. If Bob’s internet connection is flakey, or he doesn’t have the bandwidth to support the huge demand for his data, then having mirrors will ensure the data is available when others need it.
- To make the data available at all. Bob may not be able to host inbound
http
requests. So he can just provide the files to Alice and let her host them. Anyone requesting the data can be confident that it is still Bob’s data, even though it is only available from Alice. - To hold Bob accountable for his data. Bob may want to pretend that data never existed, or to alter it and pretend the prior version never existed. By helping to make the verifiable data available Alice can ensure Bob can’t run from his data. Note that Bob doesn’t have to give Alice permission to host a mirror. It doesn’t matter how Alice receives the files, she can use them to host a mirror.
Now Carol wants to see Bob’s data. Maybe Bob tells her his DataLayer singleton launcher id directly, or maybe Alice gives it to her. Or maybe she gets it through some website. Regardless of how she receives it, she calls subscribe
on his singleton with ID0x745698
.
As before, Chia DataLayer will look for announcement coins for that singleton, but now it will find two. If Carol directly calls get_mirrors
, she will see both http://bob.com
and http://alice.com
. Initially, there is no basis to prefer one over the other, so it will simply pick one. Note that the announcement coin has a value (10 mojos in this example). Right now, that value is not used, but in the future it may be used to help clients select a mirror.
Carol happens to select Alice’s host to get the data. Nothing happens on chain when Carol subscribes. Bob has no way to even be aware that Carol has subscribed to and received the data. And Alice only knows that someone fetched Bob’s data from her, but cannot reliably know anything about who.
The data Carol receives is incorrect. This can happen for many reasons, from a simple file-management error on Alice’s part, or a data transmission error, or malicious intent from Alice or from a man-in-the-middle. Regardless, the data Carol receives does not hash to the same value that Bob recorded on chain, so Carol discards the data.
Carol then selects another mirror, this time http://bob.com
, and successfully receives the data. In future requests, Chia DataLayer will deprioritize mirrors that are unresponsive, don’t have the needed data, or return incorrect data.
Now Bob decides to update his data. He calls batch_update
(update_keys_values
) with a list of rows to insert and delete.
Block 130:
Spent DL singleton coin (launcher ID: 0x745698, hash: 0xAB12)
Create DL singleton coin (launcher ID: 0x745698, hash: 0xF334)
Chia takes the changes, applies them locally, calculates the new hash, and updates the singleton on chain with the new hash. It also creates the new files and stores them (by default) in a directory under the user’s .chia/mainnet/data_layer/db/server_files_location_mainnet/
folder. These files are for update 2
of the singleton, and so sit side-by-side with the files for update 1
without causing a filename collision, even if the hash ends up being the same (e.g., no net records inserted or deleted).
By default, Chia provides a little http server so that these files can be served directly from the machine with the wallet. This server is optional, and you can use DataLayer without it. You can start this server with chia start data_layer_http
, or by flipping the relevant switch in the settings page in the GUI. However, this requires that you: a) have a static IP or a dynamic DNS name that can track your current IP and b) open the inbound port through whatever network infrastructure you have to that machine. I think this may be a significant hurdle for many users.
Alternatively, if you have a cloud hosting account somewhere, you can upload the file to your cloud account in a bucket that is accessible to the public. If your provider charges network data egress fees, you may want to put something like Cloudflare in front of it to prevent getting maliciously bombed with a huge egress bill. I personally host the DataLayer tables I publish and mirror on Google Cloud and use the Cloudflare free tier account to protect it. I wrote a little script to copy the files from my local machine to the cloud account. This is definitely not something most users would do.
My open plea to the community: I really hope to see DataLayer hosting services crop up, with a little uploader I can run locally that sends my DataLayer files to the hosting service and makes this whole part of the process easier. There is a good business model here too, as these providers can (and should) charge for their services and make a profit. Personally, I hope to see many such services hosted in different jurisdictions, and to have the ability to upload to multiple such services, to help with censorship resistance. If I could pay for that service with farmed XCH and upload the data through a VPN or Tor, I could ensure my anonymity as well. Maybe send me the bill in an offer notification with an NFT “receipt”? Ideally, I’d love to see a CHIP for the protocol for connecting to such services, so that the local uploader can be standardized.
Back to Bob and his update. As a part of subscribing to Bob’s DataLayer singleton, Alice and Carol’s Chia wallets are each now tracking Bob’s singleton. They will notice that it has been spent, and will read the new hash of the data from the blockchain. Alice and Carol don’t have to do anything to make this happen, the Chia wallet takes care of it.
Alice’s Chia wallet reacts first. There is only one mirror available that she does not own, http://bob.com
, so it fetches the data from there. The download is successful, and the locally-calculated hash matches the hash on chain, so the data is accepted.
At this point, the updated data is now available through get_keys_values
, get_keys
, and get_value
. Additionally, if she calls get_root_history
on Bob’s singleton, she’ll now see three hashes:
0x0000
— the initial has before any data was added0xAB12
— the hash of the first version of the data0xF334
— the hash of the updated data
(Here, “root” refers to the root hash of the Merkle-tree version of the data. It is a hash of all the data, but the data is hashed in a Merkle tree rather than all at once. This becomes much more relevant later.)
Alice can also see the history of the changes that Bob has made to his data using get_kv_diff
. This RPC accepts any two root hashes in a given singleton’s history and returns the net rows inserted and deleted between those two root hashes. So Alice can distinguish between the changes Bob made in his first update and the subsequent changes made in the second update. The changes reported are net changes, so if a group of rows are both inserted and deleted in the updates between the two hashes provided to get_kv_diff
, those rows will not show up in the diff output at all.
Now that Alice has the files for Bob’s update, she is able to mirror them. If she uses the default built-in http server, there is nothing she needs to do: the http server will serve those files the same as it had with the files for version 1. If she is hosting the files remotely, she will need some mechanism to send the files to the hosting location.
Next Carol’s wallet detects the update to Bob’s singleton. As before, it sees that there are two announced mirrors, selects one, downloads the files, successfully verifies the hash, and makes the data available to Carol.
What if Carol’s wallet had requested Bob’s data from Alice’s mirror before Alice had received it? That’s fine. Carol’s wallet will simply retry from another mirror and if none are available that have the data, simply wait in gradually increasing intervals before retrying again. In fact, each subscriber will wait a random (but small) amount of time after detecting an update on chain before attempting to fetch the data to give it a chance to propagate through the network without every subscriber requesting it simultaneously. Additionally, each subscriber will randomize the sequence of available mirrors from which it tries to download the data.
Finally, Alice decides to stop sharing her data, which Carol has not yet received. She calls delete_mirror
and specifies exactly which mirror announcement coin to melt. The XCH in the announcement coin then gets returned to her normal XCH wallet.
Block 140:
Spend announcement coin (value: 10, hint: 0x123456, memo: "http://alice.com")
Create XCH coin (value 10 mojos, receive address specified by Alice)
Now Carol decides she wants to subscribe
to Alice’s DataLayer singleton. The Chia DataLayer wallet cannot locate an announcement coin that gives a URL to fetch the data, so no data can be downloaded. However, she can still use get_root_history
to see all the hashes that had been recorded for Alice’s DataLayer singleton, and at what block height each had been recorded. She can also detect any new updates Alice may make. Alice can continue to update her DataLayer table with more data, and so long as she doesn’t publish the files anywhere, all anyone can see is the hash history.
Why would someone want to use DataLayer and NOT share the files? Many reasons, in three basic situations:
- The data isn’t for sharing. Recording the hashes on DataLayer will let you prove what data you had, when you had it and whether it had ever changed to a private auditor. That auditor may literally be an auditor, or may be a business partner. There is a huge number of non-Web3 use cases that would benefit from the ability to provide this kind of proof.
- The data isn’t for sharing, yet. You need to be able to prove some data hasn’t changed over time. This is useful in many situations where the timing of a disclosure is important. Also games of chance, where you may want to prove what random seed was used for the game, but only disclose it after the game is complete.
- The data isn’t for sharing publicly. The files can be sent directly from one party to another, or to a select group of others. The recipients can confirm the validity of the data and confirm that any other recipient will receive the exact same data. No other party needs access. This is useful for a group that wants to share data in private without a centrally-managed repository that all members of the group must trust. In this context, there is no requirement even to use http.
As always, data is readily copied. So even though Alice no longer shares her data publicly, Bob still has a copy of it. Bob can share the files directly with Carol, and Carol can verify that the data in the files Bob provides is correct. Alice, having given the files to Bob, can no longer control who receives them. And Carol can prove she has the correct data without disclosing where she got it.
And so ends the story of Alice, Bob and Carol. It turned out much longer than I had expected. For those of you that made it this far, thank you. I hope it was useful.
Killer Feature: Proofs of Inclusion
I mentioned above that the hash stored in the DataLayer singleton is actually the root hash of a Merkle tree. So let’s first describe a Merkle tree, in the context of DataLayer.
DataLayer data stores are made up of (key, value) pairs. I personally think of the DataLayer data stores as tables, and consider the (key, value) pairs to be rows. A hash function is a function that takes as input some arbitrary data and produces as output a fixed-length value, called the hash value. In a secure hash function, it is impossible to create data that will exactly match a particular output hash value. DataLayer uses the secure hash function SHA-256.
To form the Merkle tree, we start by concatenating the key and value of each row and calculating the secure hash h(key,value)
. This is a rowhash. These hashes form the leaves of the tree. Each interior node in the tree is the hash of the concatenated rowhashes of its two children. The root node is the hash that ultimately gets stored on chain.
A quick visual might help:
So the tree is entirely hashes. There is no actual data stored in the Merkle tree. Any change to any key or value will change the root hash. There is no way for someone to come up with a different row or subtree or even a whole different tree that will produce the same root hash. For more information, the Wikipedia article on Merkle trees is pretty good.
The great thing about the Merkle tree is that we can prove that a particular (key, value) pair exists in the tree without knowing the whole tree. All you really need are the peer hashes at each level of the tree up to the root. In this example, we want to prove that row 3 exists in the tree:
Specifically, we want to prove that the hash of row 3 (in blue) is in the tree. To do so, we need to provide the hash of row 4 and the hash of the subtree that includes rows 1 and 2 (both in yellow). With that information, we can compute the root. If the computed root matches the one on chain, we can be certain that row 3 is in the tree. This is a proof of inclusion.
Why do we care about proofs of inclusion? Because the Datalayer singleton can execute them on chain. That’s a big deal. The DataLayer singleton smart contract already has the root hash, so simply providing the rowhash to prove and the necessary peer hashes up the tree allows it to verify that the given rowhash exists in the tree.
That enables any data in a DataLayer table to be provided to a smart contract on chain and validated so that it can be used by the smart contract, without having to store all the data directly on chain.
There are a huge number of things you can do with this capability, and I look forward to more of them becoming available over time. For now, we have one DataLayer feature that uses it…
Proofs of Inclusion: Two-Party Commit
One of the features of Chia that really sets it apart among blockchains is offer files. Offer files allow you to make an offer to trade: you offer to give one asset in exchange for receiving another. You then sign the offer, and make the offer available on the Internet. Anyone that likes the proposed trade can download the offer, attach the asset you had wanted to receive, sign it and submit it to the blockchain to get the asset you had offered. Once one person takes the offer, it is no longer valid, and no one else in the world can use it. The assets exchanged this way can be coins, like XCH or stablecoins, or they can be NFTs or any other asset on chain.
In centralized databases, you have the ability to queue up multiple updates to the database and commit them all at once, so that either all of the updates are recorded successfully, or none of the updates are recorded at all. Imagine recording a bank transfer: you wouldn’t want to deduct the money from one account without also adding the same amount to the other account. In database terms, recording an update of several tables at once this way is called committing a transaction.
With DataLayer proofs of inclusion plus Chia offer files, you can truly own your data and still participate in coordinated database updates.
For example, Alice and Bob each have a DataLayer table. Alice wants to propose an update to both tables. So she creates an offer file that contains:
- An update to her own table to add a new row (potentially also including deleting some prior row that had the same key)
- A proof of inclusion of that new row in the table
- A demand for a proof of inclusion of the row that Alice will require from Bob
At this point, nothing has happened on chain. Alice’s DataLayer table does not include the new row. Instead, she packages all of the above into an offer file and sends it to Bob.
Bob opens the offer file and sees the contents. He decides to accept the offer. So he attaches an update to his own DataLayer table and creates a proof of inclusion of that updated row that satisfies Alice’s demand. Now he submits the whole thing to the blockchain, and both tables get updated. Alice can see on chain when the update to her table gets committed and knows that Bob’s data has also been updated as requested.
This is tremendously valuable for using DataLayer in any multiparty system. In a supply chain, goods can be marked ‘delivered’ by one party if and only if the other party marks them ‘received’. A contract can be signed by having all parties record the same copy of the contract. Decentralized game players can have richer interactions than simply trading goods.
Here’s a thought experiment: what if one side of the offer file was cryptocurrency instead of a data update? What could you do with the ability to pay for another party to make a specific update to their data?
Here’s another thought experiment: what if, in a single spendbundle, the Datalayer singleton announces a proof of inclusion of a rowhash, and another smart coin accepts in its solution the key and value for that row? That smart coin could prove the given key and value match the rowhash that the DataLayer singleton validated and could then safely use that data to drive functionality in that coin’s smart contract. What could you do with a smart contract that could access data from an arbitrarily large, yet cryptographically validated, dataset?
More on this and other potential DataLayer capabilities in another article.
DataLayer RPCs (as of Chia Reference Wallet 1.7.0)
So now you’re excited about the possible capabilities of Chia DataLayer and you understand how it works. The next thing is to understand how to use it.
First, the reference documentation is the primary source of information about the Chia DataLayer RPC endpoints.
Read the official documentation here.
That document is a reference, and lists each available endpoint but doesn’t really give a good walkthrough of how to use them. So I’ll do my best to provide such an introduction here. Again, this is not official Chia documentation, but my personal attempt to help others to use DataLayer.
Before you can do anything else, you will need to start the Datalayer service with:
chia start data
Or, in the UI:
Settings -> DATALAYER -> Enable DataLayer
Publishing Data
Before you can publish data, the first thing is to figure out how people will receive it. Right now, the only method directly supported by the Chia reference client is the mini HTTP server included. To use it, you will need to also start the Chia http server:
chia start data_layer_http
or use the UI:
Settings -> DATALAYER -> Enable File Propagation Server
Next, you will need to open an inbound port to that server. By default, it runs on port 8575
. That can be changed in config.yaml
. Read the manual on your router for setting up the required “port forwarding rules”. Hopefully, you’ll already know how to do the port forwarding from opening port 8444
for your node, right?
Next, you’ll need to know your inbound IP. There are any number of tools for this, such as http://whatismyip.com. Finally, you’ll need to either request a “static IP” (an IP address that never changes) from your internet service provider, or hope that they don’t happen to change your IP. Unfortunately, requesting a static IP usually comes with a significant cost. Alternatively, you could set up with a “dynamic DNS” provider, but that just starts to get even more complicated.
Please see my public plea above for services to help deal with this situation.
Assuming you have all that sorted, you can create a DataLayer table with:
(I long ago got in a bad habit of calling the data represented by a DataLayer singleton a “table”. The correct term is “data store”. Please accept my apologies for any confusion caused.)
This will create the singleton with an empty hash. It accepts a fee
parameter, since it is submitting a transaction on chain.
Note that one parameter that is missing is the fingerprint
parameter. This is an unfortunate oversight. All DataLayer calls use whatever key is currently logged-in on the main wallet at the time the call is made. There is no mechanism at this time to specify which key to use, and you are likely to get into a bad state if the current key changes while you’re using DataLayer.
create_data_store
will return the id
of the new data store. At that point, you can call:
Pass in the id of the newly-created data store, and one of the returned values will be a boolean indicating whether the root has been confirmed. Wait until the root is confirmed before proceeding.
You can access a list of the data stores owned by your key by calling:
Next, assuming you want to announce a URL where people can access your data, you will use:
From DataLayer’s perspective, every announcement of a URL on chain is an announcement of a mirror. Anyone can announce a mirror, whether they are providing the data, are subscribed to the data, or just happen to know where that data can be found.
add_mirror
takes a fee parameter, as it does create a transaction on chain. The same warning about fingerprints and keys applies here as it did for create_data_store
.
delete_mirror
and get_mirrors
work as you would expect. get_mirrors
is the only way to tell when add_mirror
completes. Until it is confirmed on chain, the new mirror does not show up in get_mirrors
.
Finally, to actually publish data, you will use:
This endpoint accepts a “changelist” which is an array of INSERT and DELETE records. The keys and values for each row are blobs, and must be encoded in hex. Any number of records can be in the array, and the keys and values can be of any size, subject to memory size limitations in your system. As a practical matter, most systems can handle a maximum of 50–100 MB of changes. This endpoint also produces a transaction on chain, and so includes a fee parameter and is subject to the same warning about fingerprints and keys.
Because the data is stored in binary, any format is acceptable. It can be JSON for structured data documents or PDFs and other less-structured data. Also, the data can be encrypted to limit who can read the data and can be compressed to reduce the storage / bandwidth requirements.
To know when the update has been confirmed, call get_root
. When the transaction is successful, the new root will have confirmed equal to true.
Accessing Data
To access data you do not own, you will need to call:
This accepts the id
of a data store and will attempt to subscribe to it. There is no transaction on chain. You can optionally provide a list of URLs from which to attempt to access the data. If provided, these URLs are used in addition to any announced mirrors to access the data.
If there are URLs available from which to access the data that you do not want to use, you can use remove_subscriptions
to remove them from the list. This command does not unsubscribe from the data store.
unsubscribe
stops fetching new data for the data store, but it does not remove any data from the local machine. subscriptions
gets the current list of subscriptions.
Once subscribed, to see the history of data for a particular data store, you can use:
This will return all of the recorded hash roots for a specific DataLayer data store. Alternatively, you can use:
To get the latest roots for a collection of data stores.
You can access the data with:
These do what you’d expect. get_keys_values
is dangerous as it returns every key and every value in the data store at once. There is no way to tell in advance how large that may be. The keys and values are hex-encoded, and so are much larger in the response payload than the actual data they represent. It is much safer to call get_keys
to get all the keys and then call get_value
on the keys of interest.
All of get_keys_values
, get_keys
and get_value
return the latest data that has been received. They do not return the root hash associated with that data, so there is the danger that you receive stale data. To address this, you can pass in a specific root_hash
parameter to each of them to specify that you want the data for a specific root hash. There will always be a delay between detecting an updated hash on chain and receiving the corresponding data (if you’re able to access the corresponding data at all). If you request data for a root hash that you don’t yet have, you will get an error.
Finally, you can use:
This will get the rows changed between any pair of hash roots for a particular data store. Note that what is returned is the net change, so any rows that were both added and deleted, or deleted then added, between the two given hash roots will not show up as diffs in the output.
Sequentially calling get_kv_diff
on each adjacent pair of root hashes will provide the exact history for a data store.
Offers and Two Party Commits
The high-level process for using DataLayer offers is straightforward: the maker makes an offer file with:
and sends it to the taker. Unlike regular offer files, there can only be one taker — the owner of the DataLayer singleton that needs to complete the offer.
The taker can then verify the offer with:
This will make sure the offer file is valid. The verification step is entirely optional.
Finally, the taker can accept the offer with:
This will first check that the offer file is valid, then make any updates to the data store(s) specified for the taker in the file, and finally submit it to the blockchain.
Note that none of the DataLayer offer RPCs have CLI equivalents. To call them from the CLI you will need to use:
chia rpc data_layer make_offer <JSON request body | -j request.json>
Now let’s take a look at what DataLayer offer files actually do. Here is an example request body for make_offer
:
{ "maker": [
{ "store_id": "e76f1aa5a983531580c1c13b1b11dd508b9261dce7b1a009ca2af1e5e92652c3",
"inclusions": [
{ "key": "48656c6c6f",
"value": "576f726c6421"
}
]
}
],
"taker": [
{ "store_id": "0ba3e6573d9e1356f87ac59f4980e25d736d0614276e23bbc56a2df6c09d811b",
"inclusions": [
{ "key": "48656c6c6f",
"value": "576f726c6421"
}
]
}
],
"fee": 1000000
}
Here we see that there are two sections: one for the maker
and one for the taker
. The maker side lists the proofs of inclusion that the maker is going to provide. The taker side lists the proofs of inclusion that the taker must provide to complete the offer and submit it to the chain.
Each section has an array of store ids. Each side of the offer can include as many proofs of inclusion as needed for as many data stores as needed. For each data store, there is an array of inclusions
. These are the proofs being offered or required. Each proof includes the cleartext (hex) key and value of row being proven.
Finally, there is a fee. This is the maker fee that will be included in the offer, in mojos.
The maker must be subscribed to the taker’s singleton(s) to call make_offer
. However, the maker does not need to actually have the data.
The response from make_offer
looks something like this:
{
"offer": {
"maker": [
{
"proofs": [
{
"key": "48656c6c6f",
"layers": [
{
"combined_hash": "dea970ce50eea5d936d55684a90ba66be3bce60d20140ca3fc23d578332c061a",
"other_hash": "adc91718ace34b8a2f6faaf226a36714c88f932d3bbf886502fc88f8d6007f17",
"other_hash_side": "right"
}
],
"node_hash": "509ed575992c34140b77ee226421ed2c67606bd5fd26763f45fb804bb6f47773",
"value": "576f726c6421"
}
],
"store_id": "e76f1aa5a983531580c1c13b1b11dd508b9261dce7b1a009ca2af1e5e92652c3"
}
],
"offer": "000000040000000000...cd28f7cfaeb6c4473ce6cc",
"taker": [
{
"inclusions": [
{
"key": "48656c6c6f",
"value": "576f726c6421"
}
],
"store_id": "0ba3e6573d9e1356f87ac59f4980e25d736d0614276e23bbc56a2df6c09d811b"
}
],
"trade_id": "e7423eb210b373fe8c271c190f210c6316daeddaca1cf5b307a50d6194fd8be1"
},
"success": true
}
This is the offer file with the exception that the final "success": true
needs to be removed.
Let’s break down what is in this offer file:
- Again there are two sides: the
maker
and thetaker
. - The
offer
section will be a very, very long hex stream. This is the actual on chain offer (spendbundle). Bothverify_offer
andtake_offer
will confirm that the hex offer matches the surrounding human-readable JSON. - The
trade_id
is for tracking the transaction.
The maker section includes an array derived from the inclusions
section of the maker’s request, each element of the array includes a store_id
and some number of proofs
.
Each proof is exactly what you’d expect, given that the data is stored in a Merkle tree. The cleartext hex key
and value
, the nodehash
of the (key, value) pair, and then for each level of the Merkle tree from the node to the tree root there is a record specifying the peer hash (other_hash
), which side the peer hash is on (other_hash_side
) and the result of hashing the concatenation of the two (combined_hash
). The top-level combined_hash
is the Merkle root of the data store when the proofs of inclusion run on chain.
Using this json file, the taker can clearly see what the maker is offering, and what the maker is requesting.
Note that creating the offer file does not necessarily mean that the maker has to update their data store(s) at all. If the rows for which proofs of inclusion will be provided in the offer file already exist in the maker’s DataLayer tables, then there is no update to the maker’s data store.
Also note that the maker does not provide INSERTS and DELETES, but instead only provides proofs of rows that will exist when the offer is successfully completed. There is no way to use proofs of inclusion to prove that a specific row does not exist in the table, except by proving that a different row exists with the same key. If the maker creates an offer with a proof for a row with the same key as one that already exists, it is treated as both a DELETE of the existing row and an INSERT of the new row.
The maker sends the offer file (with "success": true
removed) to the taker. The taker can inspect it and know exactly what rows the maker is offering to prove, and what rows the taker is expected to prove.
To make the offer usable, the taker must attach an entry to the end for the fee they plan to include when submitting to the chain. This can be zero. So where the maker had "success": true
, the taker will insert "fee": 0
. Again, the fee is in mojos.
The taker can first call verify_offer
to make sure the offer file is valid. The body of the request is the offer, including fee. In response, the taker will get something like this:
{
"error": null,
"fee": 100000,
"success": true,
"valid": true
}
Assuming the taker wants to accept the offer, they will submit it to the chain with take_offer
, where the body of the request is the offer. The response will look something like this:
{
"success": true,
"trade_id": "b008ea60a001b4557f49d13029f9d0d533eb78ad4fad8c8501ae631c9179b958"
}
The taker also doesn’t necessarily have to make an update to their data stores. If the needed rows already exist, then taking the offer simply creates the needed proofs of inclusion, with no update to the root stored on chain.
At this time, there is no reliable way to tell when the offer submitted by the taker is confirmed on chain. If taking the offer involves updating the taker’s root hash, then using get_root
or get_root_history
will be a good proxy to see when the transaction is confirmed.
While the offer is pending, the maker will not be able to make further updates to any data store(s) involved in the outstanding offer(s). To cancel an offer, the maker can use:
This will cancel the offer. The trade_id
from the original offer file is used to identify the offer to cancel. Like canceling other kinds of Chia offers, there is an insecure option that just tells Chia to forget about the offer. The taker could still take the offer if they received the offer file and choose to accept it later. There is also a secure option that actually invalidates the offer by spending one of the DataLayer coins involved in the offer. This requires a blockchain transaction and potentially a fee, but ensures that the offer cannot be subsequently taken. Using the insecure version of cancel and immediately updating the data store on chain will also securely cancel the outstanding offer.
Conclusion
I hope this article has been useful. I called it a “Primer” both because it is a useful place to start when working with DataLayer, and because I hope that it “primes the pump” and helps to get people using it.
At the top of this article, I had proposed a number of reasons why DataLayer was different from other ways of managing data. To conclude, I’d like to review those points and more clearly explain them:
- You Own Your Data. The DataLayer singleton is a coin like any other on the blockchain, and custody of that coin is custody of your data.
- Censorship Resistance. Once DataLayer hosting services become available in multiple jurisdictions, it will be incredibly difficult if not impossible to censor DataLayer.
- Peer-to-peer. Anyone can host DataLayer data anywhere, and subscribers to that data get it directly from wherever it is hosted.
- Auditability. For every Merkle root recorded on the blockchain, there is a data file that can be proven to exactly match it. Each root is recorded at a specific block height when the DataLayer singleton was spent, proving when that data file was committed by the owner.
- Accountability. Every DataLayer singleton is associated with a single key. The owner of that key cannot escape the data that they previously published.
- Durability. Anyone can host a mirror of the data associated with a DataLayer singleton. With more mirrors, the data rapidly becomes more durable. If that data is important to you in any way, you can ensure it survives by hosting a mirror.
- Privacy. Chia DataLayer data is not shared by default. You have to actively announce a mirror to share it. The default mechanism for sharing data is http, for which there are many systems available to restrict access by username/password, certificate, source IP, token, etc.
- Smart Contracts. DataLayer proofs of inclusion allow smart contracts to validate cryptographically-signed data without having all of that data stored directly on the blockchain. That data is already accessible in offer files, and in the future will be made available to smart contracts in other ways as well.
There are a thousand things I can think of to do with DataLayer, and I’m sure I’m just scratching the surface. I look forward to seeing what the community does with it. I hope to write many more articles on DataLayer and some of the other technologies on Chia with which I’m intimately familiar. Please let me know if there is anything in particular you’d like me to address.
Please post questions, comments, requests for clarifications and the like here. I will monitor and reply as best I can.