Developing the EigenTrust Algorithm and Determining Authenticity Online

The “mad dash” that created a new algorithm.

Mario Schlosser
Oscar Tech
10 min readJun 3, 2019

--

Last month, I got to do a really fun thing: I received an award from the International World Wide Web Conference Committee, called the “Seoul Test of Time Award.” That committee runs WWW, aka the International World Wide Web Conference, which is an annual conference that ranks among the top-10 most important computer science research conferences (the top-2 are now, not surprisingly, machine learning conferences). The “Seoul Test of Time Award” gets awarded for computer science papers that were presented at previous WWWs and that the Committee thinks have some sort of historical importance for computer science. The first award went to Larry Page’s and Sergey Brin’s paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, which they presented at WWW 1998 and which for the first time described PageRank, the algorithm that underpins all of Google. One of the other three papers that got this award so far is the paper that described “collaborative filtering”, which is the algorithm that recommends books on Amazon and movies on Netflix.

This year, my co-authors and I received the award for a paper we presented at WWW 2003 called “The EigenTrust Algorithm for Reputation Management in P2P Networks”. I thought it would be fun to tell the story behind it because it’s a nice story of discovery and life’s randomness.

Starting with a Random Question

In 2002, I was at the Stanford computer science department as a visiting researcher. I had basically cold-called and convinced a German researcher there that I should work in his group for a few months and prove myself (even though I wasn’t an expert in the field he was studying). A few weeks in, I actually had a good idea that turned into a decent computer science research paper. Then I met a German girl. Awesome. Then she broke up with me, and for a few months I was just spinning aimlessly and not doing a whole lot of work. Because I needed to figure out what I would do when my visiting researchership was ending, and because that moment was fast approaching, not doing any work made me even more down, so I did even less work.

Six weeks before I was going to go back to Germany, I got kicked out of my Stanford housing (well, the lease was up), so I slept on the floor of a house in Palo Alto rented by a bunch of my grad student friends. One day, my future co-author Sep Kamvar, at that point just a broke grad student whose decrepit house I was crashing in, and I were complaining to each other about how hard it was now to download good songs from the at-the-time relatively novel peer-to-peer networks. These peer-to-peer networks (the first of which was Napster, even though it wasn’t truly P2P) let people anonymously share files on their computers with the rest of the internet, so of course it was used mostly for sharing MP3 files.

From a computer science point of view, these networks were really interesting because they were built on completely random connections between random nodes (computers), any of which could disappear at any second. That makes information discovery really difficult: you had to basically blast out a request for what you’re searching for to the computers that were randomly around you, and from there the request would bounce aimlessly around the random network.

At the time, the only way the music industry had found to combat file-sharing was to upload tons of fake files and run lots of computers that would share those fake files with you. So you’d download, say, what you thought was the latest Nickelback song (you think this is funny), and it would turn out to be a bunch of white noise. That was unacceptable for us two broke and inventive computer science students. We wanted to establish a notion of trust in a peer-to-peer network, so that before I download something from computer x, I know what others think of their interactions with that computer. That’s easy to do in any centralized network architecture: YouTube can just store all user recommendations on its own servers and figure out who is the most trusted. But it’s difficult in a totally random, transient peer-to-peer network, because there is no centralized server, otherwise it wouldn’t be peer-to-peer!

A Mad Dash

That’s when we had the idea for Eigentrust, which in a completely mad dash went from idea to fully coded and written paper within essentially 5 weeks. I want to briefly explain the algorithm, because there are two lucky epiphanies in the story that I still remind myself of when I find myself mentally stuck in something today.

Here is the basic idea. The simplest notion of trust is just a log of your own interactions with others that you yourself can keep. So whenever Alice wants to download a file from Bob, and the file turns out to be the Nickelback song you were looking for, Alice keeps a log of “Bob +1” (starting with 0 for Bob if she’s never interacted with him). If the file is white noise, Alice logs “Bob -1”. Alice does the same with her interactions with Charlie and David. The problem is, of course, that this is just Alice’s own opinion, and in a huge peer-to-peer network it would take a long time for Alice to build strong opinions on enough others.

So instead, why doesn’t she also ask those peers that she already trusts, for their opinion on their friends? So if Alice now wants to download from David, who she’s never interacted with before, she checks with Bob (who she’s logged as +5) and with Charlie (who she’s logged as +15) for their opinions of David, and Alice weights Charlie’s opinion of David higher than Bob’s, because she trusts him more than Bob based on her own interactions.

Still pretty straightforward, and still not enough. Ideally Alice also asks her friends’ friends for their opinions, and she’d rank their opinions by what her friends think of their friends. And so on. Picture in your mind how the more you do that, the more this “spiders out” in the network, and the more you incorporate everyone’s opinion. Of course, you’d have to eventually actually ask everyone in the network, and that’s just impractical: too many people and too much cheating (peers lying about their own trustworthiness) in the process.

So to make this simpler, zoom out to bird’s eye view, and assume for a second that there is indeed a centralized organization that is all-knowing. What you want is just one list of values of trust that the entire network places in each peer in the network. So each peer should in the end just have one trust value, and it somehow incorporates all interactions with anyone. Let’s just start everyone out equal: initially, the network has the same trust in everyone, so that initial list of trust values is the same number for each peer (let’s just say it’s 1). Now you want to calculate a better trust value for each peer (let’s say Alice): you do that by taking what each other peer thinks about Alice (say, Bob has her as a -5, and Charlie has her as a +10), and you multiply Bob’s rating of Alice with Bob’s own global trust rating. You do this because you want to weigh Bob’s personal opinion of Alice with the network’s global opinion of him. (Remember that when you start out, everyone’s global trust rating is 1.) Here is the equation:

Alice’s new global trust value = What Bob thinks of Alice x Bob’s global trust value + What Charlie thinks of Alice x Charlie’s global trust value + […the same for all other peers of Alice].

This will create a different number than Alice’s previous global trust rating of 1, so now Alice has a new global trust rating.

So now that we have a new global trust value for Alice, the network has essentially collectively updated its global opinion of Alice, by weighting her evaluators’ evaluation of her with each evaluators’ evaluation. (What hasn’t changed is what Alice thinks of Bob, or what Bob thinks of Alice, that’s still just based on their own interactions.) Now do that for each peer in the network (a new global trust value for Bob, Charlie and so on). Then you’ll have a new global trust value for each peer in the network, and most of them will be different than the initial rating of 1 that you gave each peer. So the input to Alice’s global trust rating has now changed, look at the formula above! Bob’s rating is now different, so is Charlie’s. Why not just run the calculation again? And again? And again. Because you’ll get a new number each time.

The First Epiphany

That is where the first epiphany came in. Sep, my co-author, had earlier that summer developed an algorithm to speed up PageRank, the algorithm underpinning all of Google. So he recognized this simple algorithm from above: it’s exactly what PageRank does. In PageRank’s case, you’re calculating the internet’s collective view of how important a particular website is; in our case, you’re calculating a peer-to-peer network’s collective view of how trustworthy a particular peer is. So Sep knew the math behind this well, because the Google founders had studied it (and he had improved it in an earlier paper). It turns out that this is a power iteration to find the eigenvector of an adjacency matrix, and that you can define pretty well how to make this “converge”: meaning, after you run the calculation a few times, the entire network’s list of trust values will stop changing, and you’ve got a stable view of how trustworthy everyone is.

And yet, for us that was still pointless. The whole idea of a peer-to-peer network is that there isn’t any all-knowing central authority. So how could we do this in an entirely “distributed” manner? Meaning, how could we let each peer do the calculation work, just with data he or she has by itself? We found a really neat way to do that: you can have each peer calculate its own updated global trust value. Alice could do it for herself, for example. Look at the equation again:

Alice’s new global trust value = What Bob thinks of Alice x Bob’s global trust value + What Charlie thinks of Alice x Charlie’s global trust value + […the same for all other peers of Alice].

Alice could just ask Bob for his opinion of her, and Charlie for his opinion of her. That would give her two of the values she needs for the calculation above (What Bob thinks of Alice and What Charlie thinks of Alice). Remember, Alice knows all those people, because they downloaded from her, so she can just ask them “what do you think of me”. And the beauty here is that this is an entirely “local” calculation: you don’t need any centralized authority, it’s just friends asking friends.

But, how does Alice get Bob’s global trust value, the other value in the equation? Notice how she calculated the network’s global trust value for her by herself — so Bob must have done the same for himself! So she can just ask Bob for the network’s updated trust value for him as well, because he calculated it for himself. All totally local! The key is now to repeat this calculation over and over. Alice doesn’t just do this once, she does it all the time. New global trust values will zoom across the network all the time, but because of certain other characteristics we put into the algorithm, you can show that they will always converge.

The Second Epiphany

Here came the second epiphany: the above algorithm is still pretty useless. We want an algorithm that calculates trust, and we’re asking Bob to calculate and report back how much the network trusts him, all by himself? Obviously when Alice asks Bob for his global trust rating (which, remember, he is responsible for calculating himself), he could just tell her the mathematical equivalent of: “I’m most trusted, thanks for asking.”

Remember that paper I had written earlier that year? (Here it is.) It dealt with precisely that issue: how to store information in a chaotic network and find it again. It imagines the entire network as a giant multidimensional cube, and it gives each peer in the network a location in that cube (using a so-called cryptographic hash function). If you combine the two ideas, instead of letting Bob calculate his own global trust value, what we do is to find another peer (or more) that do the calculation for Bob, and because we do it through a hash function, we can do it in a way that Bob himself won’t be able to cheat on it.

By the way, if some of this sounds familiar, then it’s because a part of the basic approach here later came back in the form of cryptocurrencies. Those also rely on distributed storage of values that are important for a chaotically connected network to function, and they distribute the work with the use of cryptographic hash functions. (I think one of the more popular cryptocurrencies actually directly references Eigentrust.) That’s obviously not to say that we invented crypto: the idea of hashing stuff in a chaotic network was in the air at the time (at least three different papers pretty much simultaneously at the time had the idea of addressing peer-to-peer networks through these functions). It does show how ideas sometimes come in waves, and how the ebb and flow spans many years.

So that was it

A random idea, for a very practical problem (downloading error-free songs) used a simple approach that turned into unexpectedly elegant math, and an unexpectedly elegant, almost visualizable solution to turn that into a network. Plus the randomness that the two of us had worked on pretty much precisely the two components of the solution separately in the months before, in very different areas. Hector Garcia-Molina, our co-author in whose computer science group we both were, was incredibly helpful in writing the paper.

The lessons I draw from this are very simple: This idea bubbled up from a random conversation, so have more of those, but with people who really care. Sometimes you have an unexpectedly good idea quickly, but really it will only work (and you probably only had it to begin with) because you had previously immersed yourself so deeply in some other thing. Heartbreak is good for writing pop songs and sometimes even for stumbling upon weird algorithms.

Want to talk more tech? Send our CEO, Mario, a tweet @mariots

--

--