One of the first problems we faced when using a p2p system for sharing data
is how does one create a single source of truth when you might want to revoke
access or change the data at any given time? How are those changes going to
be reflected in a p2p network. We’ve explored several ideas around this, using
an SQL database, using HyperDB, recursive encryption using HMAC, and also a NoSQL database. At the core of each one of the usages, we leverage Dat to ultimately share this data over the p2p networking that comes built-in. To answer which one of these solutions would work best, we need to answer a few questions about what is a single source of truth, what does it mean to subscribe to data, and how can fine grained control be implemented for each set of data? I’ll answer these questions in the following sections.
The Single Source of Truth
So what is a single source of truth? The way we have defined it at Lens is that it is data where the original dataset is owned by the end user and can only be changed by that end user. All other changes can easily be rejected by the underlying network. Additionally, any change that the user makes, will eventually be reflected throughout the entire network. So if I want to share my phone number with a number of different services or users, I give them a link to that number. As a result, whenever I need to change that number, I don’t have to access each one of those entities separately and update the phone number. I simply update the number in one place and all other subscribers will get those changes. So the single source of truth
really means that you can update information for everyone who is interested in it! This leads us to the next question, how can we have fine grained control of this data? I.e. maybe I only want to share my work phone with one person and my mobile phone with another, how can we implement this with a single source of truth?
Fine Grained Access Control
There are a number of ways to implement access control for data. At the core
though, we want to provide a similar system that Linux provide to users on a single machine. For example, user directories are completely hidden from other users unless explicit access is given to another user from the owner. This is known as capability based security and is the mechanism that is used at Lens to give fine grained access to interested parties from the owner of the data. We extend this idea over a peer-to-peer network so that data can be published and subscribed to from a single bucket of data. So how can we achieve this goal?
Implementation of Access Control Lists Over a P2P Network
As we have mentioned before, we are building much of our P2P infrastructure off of the libraries built for the datProject. The datProject itself is a capability system. In other words, the encryption key that is posted on the P2P network is unguessable so it must be given to the desired party out-of-band. So the only way to access my data is to know my dat URL. For example, I might create the following JSON object to represent my contact information and store that file in a directory:
now to share that information with another party, I would simply got to my
directory and then run `dat share .`. This would generate a nice dat url like:
Now, when I give that url to that party, they would be able to do a dat clone dat:9a0… to get my data! Voila, we have a capability system. Now if I want to share my single source of truth with other users, I simply give them my dat url. That way, if my phone number did change, all I would have to do is update the data in my dat directory. There is a slight problem though. What if I want to revoke access to a specific person now? Well one way would be to delete the data in that dat url, create a new one, and give that new dat URL to all the people that should have access. This process would become quite cumbersome and hard to keep track of. This is where programming can really come in handy!
The first method we have implemented here at Lens has to do with syncing all the source data with all the Lenses that get shared. The process involves a few steps:
- User creates a data source
- User creates a Lens from one or more data sources and their fields.
- User shares that Lens with interested parties
- When user updates any fields in the data sources, the Lens is automatically updated
So when the user updates their data, all of the lens subscriptions are updated as well. This is where having a Lens to data becomes a very powerful concept. With this method, companies that need a legitimate subscription to your data simply subscribe to your lens. Then if they ever need to pull it up in the database, they will always have the latest version of the data you want to share with them. As an example, let’s say you have some friends that are interested in starting an ice creamery. As one of their first steps in procuring the resources for this new business, they ask you for a Lens to your favorite ice cream and also to your e-mail address so they can get in touch with you about your favorite. As a result, you give them a Lens subscript that combines to different data schemas that are stored in your Lumen. The fist data schema is called “Contact Info” — this is where you keep your latest contact information that you want to give out. The second schema is called “Favorites” — this where you like to keep a list of your favorites in ice-creams, beers, etc. You have no need to share each schema in their entirety, but rather you simply want to share a part of each one. As a result, the lens software creates a Lens from each one of these data sources, and references the original data set that way everything can be kept in sync. The figure below demonstrates this process.
So now if you ever update your email address or your favorite ice cream, your friends will get the latest up-to-date information to help them
make better decisions about their business. Furthermore, if you decide that you no longer want them to have this information, you simply revoke access to certain fields or all the fields. This is ultimately what can fuel a subscription based model for our data.
Drawbacks of Current Implementation
One of the problems with the implementation above is that we now must create a dat URL for every bit of data that might be requested of us. At first this might not seem like a problem, but our digital selves are only growing, not shrinking. More and more of our lives are being digitized for convenience, accountability, accessibility and much more. I did a bit of browsing my password manager today and realized that I have over 200 accounts on various websites. Granted I’ve created these accounts over the course of a lifetime, but that number is not going to shrink. So at a minimum, our users could have hundreds if not thousands of Lenses. This becomes difficult to manage from a technical standpoint because there are only so many ports and file handles available on a system so eventually we would run out. Not to mention that current implementations of the Dat protocol use quite a bit of memory per share. For our early product however, this limitation is not going to be a problem.
The developers at Lens are brainstorming ways to overcome the limitations of having thousands of dats shared from a single Lumen. One of the alternative methods we have been toying with is the idea of hierarchical encryption of all the data sources on the lumen and giving out keys to parties that are interested. With a system like that, we would no longer have to maintain a data binding paradigm introduced in the previous sections, but would rather have to keep track of an access control list for each of the data values inside of each schema. We’ve implemented a few experiments that were based off of a great post by Substack.
Wrapping It Up
I’ve used a lot of words to describe how we are iterating on the single source of truth, but the more technically inclined may be interested in a simple implementation. I’ve written up a little test script using PouchDB, basic password encryption, and some mock functions for creating Dat shares (the logic for that can get quite complex and as a result I didn’t include it.) Please take a look HERE and let me know what you think!