The future of the decentralized web
The inventor of the World Wide Web has a new vision to rescue it.
In conversations over the course of 2018 and 2019, Sir Tim Berners-Lee has articulated a new approach to manage many of the problems that have emerged in the decades since the invention of the web, and the proliferation of smart devices, apps, and cloud services that spread data around.
That private and personal data shared with third parties can be hacked, surveilled, or misused is an understood part of our reality as web users.
Now Sir Tim proposes a new protocol — Solid — whose goal is to reinvent the technological ecosystem to return information to the hands of users.
From Ethan Zuckerman’s write-up of a June 5th, 2019 conversation between Jonathan Zittrain and Berners-Lee:
Sir Tim pushes up the blackboard featuring the web as a meteor crashing back to earth. On the board below it, he starts drawing a set of cylinders. Solid is based around the idea of pods, personal data stores that could live in the cloud or which you could control directly. “Solid is web technology reapplied,” Sir Tim explains. “You use apps and web apps, but they don’t store your data at all….”
Sir Tim imagines uploading photos taken from a digital camera. The camera asks where you want to store the data. “You have a Solid pod at home, and one at work — you decide where to put them based on what context you want to use them in. Solid is a protocol, like the web. Pods are Solid-compatible personal clouds. Apps can talk to your pod.” So sharing photos is no longer about making LinkedIn and Flickr talk to each other — it’s simply about both of them talking to your pod, which you control.
To tell the story of how to save the web, first you have to understand how the web works. In a conversation held at Harvard Law School on in Fall of 2018 as part of the “Re-decentralization of the Web Study Group,” Jonathan Zittrain sketched out an understanding of the layers of infrastructure and content that have gone into making the web we know today.
(This discussion has been edited and condensed for clarity).
Jonathan Zittrain
I’m going to try to use the next 20 minutes or fewer to put some framing ideas on the table and a little bit of history to think about a decentralized web.
The first is “owned versus unowned.” The second is “layers,” and the third is “centralized versus decentralized.” And they’re related to one another but distinct enough to be worth talking about in turn.
Owned vs. unowned
So the first “owned versus unowned.” And for that I mean with respect to a given technology or system it can be owned, not so much in the sense of “legal ownership” but rather in the sense of some distinct authority, set of people, institution, group, that is in a position to control it, affect it, influence it.
And as you might guess this is a spectrum rather than a binary. But many of the technologies we experience are that way. That’s the way they come about. There’s a particular group that is in a special privileged position, that’s accredited, that wears a badge with respect to that technology to decide how it’s going to be.
Most of the consumer products we use are that way. I’m using lavalier mic right now. There’s a company that makes the mic and if there’s some problem with it we know who to call and they’ll make a follow up mic.
But there’s also the sense of “unowned” technologies. Often these can be expressed as protocols in the sense of, somebody simply declares “wouldn’t it be nice if…” or “here’s a thought, if everybody were to behave this way or if a lot of people would behave this way it would be a better world.”
You can see an example of this in the early days of sidewalk usage. Figuring out whether a walker should pass on the right or the left. Perhaps something on which we might have divergent views is a protocol to start with. A sidewalk is an unowned technology. That’s how it is on the sidewalk. Now if you don’t obey the protocol, if you walk on the sidewalk the wrong way in Britain you know pretty quickly that you’re not doing it right. Even though there’s not a law that says you must pass on the left of people when you pass on the sidewalk. And there might not even have been anybody who invented “let’s pass on the left,” it just sort of coalesced. And that becomes an unowned technique and therefore a technology that people subscribe to, and something that is changeable through the behavior of enough people that it could evolve over time.
So “owned versus unowned” to me is a really important distinction because at the dawn of the Internet revolution, the mainstreaming of the Internet circa late 1990s, that was a huge movement from what looked like was going to be an “owned” set of technologies. We’d be using CompuServe, Delphi, AOL. And each of these would be services run by companies. You’d log in, you’d pay your fee to that company. They would produce a main menu, it would have information or services you’d use. That looked like the way that information technology in the consumer space was going.
Layers
And yet the Internet supplanted it. And when I say “the Internet,” now I’m going to be really precise about the difference between “the Internet” and “the Web.” And that’s also a nice way of introducing the second concept of layers. There’s an idea of technology in the information space as some number. And the number will vary at times depending on who’s doing the chart, of layers that build upon one another. And at the bottom you have the physical layer. You have a physical layer of what kind of wire or non-wire is this thing going to operate on. And then towards the middle, you end up with the logical layer. And a logical layer is once you’ve got signals going over the wire, do they pass on the left or pass on the right. How do you start to make sense out of that. And then towards the top you have so-called apps, the application layer, and then some would say on top of that you might have content. And one artifact of doing layers this way is that somebody could develop a ton of expertise in one layer and not need to know anything about what’s at the other layers. If you want to do great content in the information space, the ideal would be you don’t have to start by mastering the intricacies of Ethernet. I can just be a content person, or I can write an app, and at some point when I’m writing the app I say “and this app sends data from here to here. And luckily that’s somebody else’s problem. I’m just the person, my app is called Napster, and it moves data around but I don’t need to know anything about how the Internet works to build Napster because I’m just dealing with files.” And then at some point I magically say “and then you move the file from here to there.”
And so you have apps at this layer, content perhaps on top benefiting from the app. You have a logical layer which is thought of as the home of Internet Protocol proper. And then underneath that the physical layer.
Now I think as history starts to unfold you end up with an app called “the Worldwide Web.” The World Wide Web, I think it’s fair to describe it as an unowned technology in the sense that someone, Tim, invents it. Here’s a server. Here’s a client. You know if you all were supposed to, or if you all would start using servers and clients you could start rendering pages that would come together and these pages could on the fly draw from here and there and everywhere to build the page. And you could even incorporate things by reference. You could just say this photo here, put it in, and if that photo changes when you build the page the next time it will be changed.
You end up with the World Wide Web as an app that runs on top of the Internet. Internet here is basically unowned. Its a set of protocols and people can jump into it by simply following those protocols. And there’s an organization or two or three that purport, in fact, DO Internet protocols. The Internet Engineering Task Force is one of them. But when the IETF proclaims a new standard it’s not like everybody at Cisco is like “well, IETF has a new standard, guess we got to program the routers.” Nobody is especially privileged to snap their fingers and just implement it.
And then the World Wide Web comes about and becomes exactly the thing that makes the Internet make sense to the general public, because part of the openness of the Internet was that there’s no main menu. It doesn’t come bundled with an app. Somebody is like “congratulations you’ve got Internet access!”
So worldwide web becomes the app that when people say “oh, I’m on the Internet” they’re on a browser. They’re seeing content they can relate to. They’re clicking on it and then if they’re feeling motivated and a little technically skilled they can start building through a markup language and later an app that can make markup language out of a WSYWIG. What you see is what you get environment think Dreamweaver. And the World Wide Web becomes people’s window through which the Internet is experienced.
“Centralized” vs “Decentralized”
So far the story I’m telling could still basically be one in which the actual architecture, logical or physical, of the technology happened to depend on some particular centralized artifice. Whether it’s a particular server or a particular set of approvals that have to happen only when this signal happens. You can design identity on the Internet in a centralized way. You could say there’s a database of names and you authenticate your name to it and then that’s how you identify yourself on the Internet. There is no such at the Internet level centralized identification. Internet architects are like “we already have a database of Internet users. It’s called Internet users and they know their names. They should just say who they are and we’re done.” And that works very well until it turned out people might have interest in spoofing one another.
So the Web ends up extremely well oriented around decentralization because a URL, a link, can be placed into a web page and it can refer to anything anywhere. And it’s as easy to put in this server as that server. It lends itself to web pages could be representing massively decentralized stuff. And when you key in a name you don’t know where you’re visiting in the world and you need not know. Again thank you layers. It doesn’t matter where you’re physically visiting.
This was really driven home to me at a domain name conference years ago when I met the Information Minister of Tuvalu. Tuvalu has the “dot tv” top level domain which is a very lucrative TLD that auctions off names. Anyone who wants a URL with “TV” at the end comes to him. And at the time I said “can I email you. I’d love to stay in touch.” He says “we don’t have e-mail on Tuvalu.” I said “Well then how do you run dot.tuvalu? Where do you host?” and he’s like “oh, it’s hosted in San Francisco. What are you talking about?” Ah right.
There are layers and things are separated from one another which is a great thing. It’s a magic box that just renders pages from everywhere.
So now let me start to bring it to the present. One example that was suggested was to think about blogging. It starts, and the Berkman Klein Center was actually a big part of it at the time, it starts in the early aughts, 2002, 2003, with the idea that people could basically set up their own websites. And through a kind of content management system either hand coded through a web server or a light system like a server side Dreamweaver, log in to their own website and post their thoughts and have blogging thoughts and that would be distributed because you’d be visiting people wherever you found them and there’d be blog rolls as people said “here are the other people I like to follow.”
It was like a very awkward way to do Twitter follows before Twitter. You would literally list the people you like and it would be like a one-way friendship unless they happen to put you back on their blog roll just like followers. And then researchers would actually look to see what blogs link to other blogs and use the blog rolls as a way of doing network analysis. Before there were friends and followers through the later generation of apps.
That was distributed. So you see the world wide web supports blogging as almost an app on top of an app that is now so much of an app that it itself is a platform. And blogging in turn supports content in a distributed way because anybody can do one. Forget how you find these blogs! That’s a search function!
Over time you end up with a couple apps for blogging that say “you know what we’re not just going to be an app you run on your server. We’re going to be the host of your blog and you don’t have to worry about security. You don’t have to worry about anything. You just show up at our site.” You blog at WordPress dot com or blogger dot com and boom there you go. Which starts to mean that for this blogging community that was by its nature in hosting and in ecumenicism of types of blogging software quite distributed, starts to get centralized. Which also starts to mean if you’re on Tumblr somebody can kick you off. Tumblr itself is an owned technology and if Tumblr gets rid of you and maybe even decides to delete all your archives and you didn’t think to back them up, they’re gone.
That starts to have implications for the blogosphere. And then you fast forward to today and you end up with the likes of Twitter or Facebook and for microblogging like you’re free to use Identica if you want to do your microblogging but people might not listen to you, you can’t build the audience you could if you have a Twitter account. Which then starts to make it so that if Twitter decides to keep you or dismiss you it’s a hugely fraught issue fought on all sides and we live in an era now of re-centralization.
Now apps have become platforms because they are so wildly popular and singular you end up then with all of your data in the hands of those platforms. And with all the activities quite monitorable and controllable in ways that basically have you end up with an owned facility that’s no longer so much in layers because it’s vertically integrated, where the stuff is stored.
What protocols are used to store it and the app you’re using all of that is now in one stovepipe.
That starts to be the story of 2018 and even for the Web itself.
We just did a study here on the reducing entropy of the web and of the domain name system. More and more you think you’re going here, you’re going there, you’re clicking on this link, you’re visiting so many different sites. They’re actually all on Amazon Web Services! You’re just sitting on Amazon all day long looking at stuff. We might as well call it CompuServe. It happens to be using Web protocol but it’s all hosted by the same place. That is an instance of centralization that looks the same to the person surfing.
We had to do a study to find out just how entropic it is that has massive implications for data stewardship.
Look at each layer, look at centralized versus decentralized. Recognize, too, how much the app environment on mobile especially has transformed things.
=====
Picking up where Jonathan Zittrain left off, Sir Tim Berners-Lee describes the emergence of the web as a decentralized space built on top of the Internet, and how it could be returned to those roots.
Tim Berners-Lee
The reason the layers of the Internet are really really excellent is that people on the infrastructure side of this could go ahead and just make the Internet faster and faster and more reliable and more connected, and more controlled. And what else can you think of that was invented is like that? Like trains? No. They were invented, they went to 30 miles an hour. And they do not now go 30 million miles an hour. Nothing else does that.
So the layering was pretty useful. It meant that the people who were developing faster protocols and fiber and stuff didn’t worry about how is it going to be used. The people who developed all the cool content and 3-D videos didn’t worry about how it was going to be delivered faster and faster because that was being done by somebody else.
Whereas if they’d all worked for cable company if somebody at a cable company said “let’s do 3-D videos” then they would have said “well, we’ll put together a team because you can design 3-D videos, but meanwhile we have to have a massive relabeling of all of our wires so that they would go fast enough for the 3-D.” And it wouldn’t happen. So the layering is really, really important.
I’ll pick up with the blogosphere. What was really interesting about that time is that you have somebody writing a blog, producing in the web, and trying to make it the best blog they could, because they were motivated by what they used to have — where now we have Google Analytics, then you had a counter. You have a hit counter and every time they’d get up in the morning and go to their web page “oooh! Somebody has been reading it!” That is really exciting. So they’re motivated by people reading them.
They realize that one of the reasons they want to make the blog is so other people who are writing other blogs out there will link to it. As more blogs out there link to it then they will get more and more people following their links. How do they get more people to link to it? Well, partly it’s making the content good, but partly it’s also thinking about not just the text content, it’s also the linked content. So they also know if they really carefully pick beautiful things out there and link to those, sometimes people link you back. They curate.
So this thing is both a piece of content and it’s a curated set of links. And that is the blog of which the blogosphere was made. And in fact some people just made lists. So you got famous for being “The List” for stuff, say, about frogs. And so we ended up with two different types of webpage. And this is all before you have to worry about Google coming up. Google only worked because of all the things that were already set up for them. Google was devastatingly good because it looked at the structure, did some really cool math on it, and said “you’re about frogs? This is the guy. Because look at their community. In fact when you look at the first eigenvector of the matrix when you take all these links, it’s him. This is the community which is best about frogs and he is in fact the greatest authority, or he’s the best content, and this guy is the best list.”
The blogosphere was an emergent wonderful thing. And what emerged in those days was this feeling that not only was it really enjoyable to write stuff. It was really enjoyable to surf around. And as you surfed around, you felt that the world was just getting better and better. The quality of the Web was going up and up. So maybe 10 years of the web went out, halfway down the arc of where it started back then in 1989 to where we are now, you would have found that the sense of the web was pretty utopian. People wrote about it, people tried to analyze it, tried to figure out, they tried to explain to people who weren’t on the web or find those articles trying to explain why it’s so cool. And they found it difficult to do so.
One of the things that we’ve lost is the long tail. You know what’s happened since then. On the surface you see everything that’s on Facebook. And when you’re on Facebook you may think “I’m doing the same things, I’m following links, I’m writing things” — but I’m not motivated in the same way to make my page wonderful because the system doesn’t care. Whether somebody reads it is determined by Facebook’s algorithms. Like when most people were on AOL, if AOL had just become the only provider of information it would have been very boring because you would never get enough innovation in it. So we’ve lost the long tail.
We’ve lost this world of people buying web software, putting it on their PC, running the server and plugging it into the wall and then suddenly becoming part of the blogosphere. So how can we get the long tail back?
Solid, Inrupt, and “Pods”
So in fact, we have a project, we have a movement. Down the road to MIT we’ve just spun a startup called Inrupt where I’m the CTO.
One of the concerns we have now is, say, when I’m on Facebook that I’ve got the silo problem in that I’ve got my photos on Flickr and I’ve got my colleagues on LinkedIn and I got my friends on Facebook. And so all I want to do is to share the photos on Flickr with friends on Facebook and my colleagues LinkedIn and I can’t. Because these are closed systems. It’s just not a function you can do. I can’t drag a LinkedIn group into Flickr, I can’t drag a Flickr photo into Facebook and use them, bringing the two groups together.
How would we make it so that you could? We have to dramatically change the architecture of all these apps. So now we’re splitting things up. Where before we had silos, we had some social network which you’re a member of, and you got all your friends in there and you’ve got your photos in another one and you’ve got your colleagues in this one — instead, we say “actually that’s the wrong architecture.” We’re going to break the assumption that if you like the Facebook application you like the way you could use it for saying who your friends are building your social graph. Well I say “OK, we have a social network app here but that doesn’t necessarily mean that the data has to be stored on the social network servers.”
At the moment if you build a social network, as a developer you have to do two pieces. You have to build the UI, the front end, and you have to build the back end. You have to build up a bunch of PHP code if you’re into that sort of thing or node code. And those two things connect together.
But now we are breaking them apart. So that whenever I run an app, before it runs, it says “Hey, Tim. Where are you going to store this data?” We call these things pods. So I’ll say “There’s the photographs I took at the talk; I think I’ll put them on my MIT pod.” I take the picture with the photo app and then I’m going to put these photos on my work pod at MIT. But I’ve also got a pod sitting under the TV at home which I’ve had for a long time now, really cool stuff on it, so I could put them there.
Storing data, saving data, isn’t rocket science. We can build a Solid compatible storage device and the apps can write any data they want onto it.
So when I’m looking at my world I’m looking at photographs which have been shared, I’m looking at photographs which are personal. And also there are huge archives of public stuff as well. So any app which is out there can go and talk to any server. And so we break open the silo, we give the user choice. The first choice is a choice of where to store their data. The next choice is, because we use standards to store this data so you can use more than one app. So in fact you might be working using one app to look at the photos on your laptop and simultaneously using a completely different one on your mobile, and your family may be using a different one — they may be looking at the same slideshow at the same time going over the same vacation, but they are using different apps. Also the user gets to become a developer if they like. So there’s a lot of open source apps out there. It may be that the photo app that you’re using is fine but if it’s an open source one, you follow the link, you go to GitHub, you fork the project, you change the app, and you tell everybody about your app and your app becomes ridiculously popular because it was just better.
And when you do that — you take the original photo app, you clone it so you make your own one, and you put it out there on GitHub. Interestingly when you run it, unlike if you’d try to make a new Flickr, you would’ve had to have first gone and got a whole bunch of PHP together and whole bunch of storage and you have to build a back end. But in a Solid world you can just put it out there. As a developer you don’t don’t have to worry about scaling the back end.
Users will get ahold of more and more data as the users use more and more apps. They’ll be using the same stores for calendars, for all kinds of things, all the things they do with computers, all things you do on the Web. They will in future hopefully be doing things, in a Solid-compatible way, so that we’ll end up with a world of tremendous enablement of user choice, where you store the data, a market for apps for the data, which will become a commodity market but also a choice of high quality storage or local storage depending on what is you’re storing, and who you are, and what you can afford, and what your priorities are.
It’s just like with the layers of the Internet, you have a choice, you could run the same app over faster, cheaper connections, here you can run the same app over a more or less secure, more or less available, more or less backed up stores and so on. We could end up with a really interesting world in which the privacy question is turned upside down. So there you go! This is Solid. That’s the idea.