Decentralized Network Graphs

I’ve been thinking about how we access content our friends share online and how we could do it without requiring centralized repositories of data. Decentralization is not so much a feature as it’s a way to achieve a higher level of personalization. Instead of a central authority managing the entire graph, each node in the graph could be the authority of its own data and immediate connections with a cache of what is known of the graph one or more steps out. Each node could distribute data (push) or query (pull) any number of steps.

This raises some interesting challenges and possibilities.

As far as I know, there aren’t any database servers or methodologies designed to distribute data in this way. Keeping data accurate and up to date across partitioned systems is tricky, and would likely require business logic middleware for authentication and filtering. I’ve experimented with distributing public keys and using signatures to verify data was provided by the entity with authority over that data, even if it didn’t come from that entity itself. Outbound filtration would involve only sharing friends’ data with 3rd parties you trust and acknowledging friends’ privacy preferences. Inbound filtration would prevent unnecessary data from being persisted and verify authority over the data being added, updated, or removed.

Once those problems are handled at the framework level, a very exciting world opens up.

When you land in a new city, imagine having a list of the top-rated places to visit in categories you care about based on ratings, reviews, tips, and activity from not only your friends, but your extended network. Ideally, each node would aggregate and normalize the corresponding person’s activity across rating services (Yelp, Foursquare, Google+/Maps/Places, etc).

Machine learning requires a lot of training data before yielding meaningful results. What if training data could be distributed, so a friend of a friend can say Don Draper is an entity that relates to another entity, Mad Men, and the next time a friend tweets about Don Draper, your system already knows to add it to your Mad Men stream or remove it if you’ve blocked TV-related posts.

Just about any factual data could be fragmented and distributed in this way. While I would love for everyone to have their own local, query-able copy of Wikipedia, OpenStreetMap, and any other open dataset, it’s more practical to only store and index the portions that relevant to the individual. It could be articles about entities you’ve tweeted about, street data in your local city, venue information (address, open hours, whether or not wifi is provided), topic and entity relations (like the Mad Men example above), hiking trails and biking paths, and who knows what else. Each of these only gets more interesting as you expand it beyond your own data to that of your friends and then to your extended network.