Challenges of online voting and reputation systems
If you’re new to the DeFacto blog, don’t forget to read the previous articles about a proposal for a peer-to-peer information verification system
Websites and social media frequently employ voting schemes to rank user-generated content. When implemented properly, such systems act as safety nets that prevent the display of low-quality content.
For an information verification protocol, voting is crucial. After a round of assessment by the selected nodes, the rest of the network should be able to determine which assessment is good enough to be added to the database. Being selected could also influence the reputation of authors.
Unfortunately, online voting is not without issues. It is, for example, trivial for malicious users to collude and influence the voting on an intentionally misleading piece of content. This is often the case on websites such as Reddit and YouTube where the content submitted is directly visible to everyone. There’s also an overarching threat in online voting known as the Sybil attack.
The Sybil attack
The Sybil attack is when a malicious user spawns several virtual identities instead of one. The term is a reference to a book about Sybil, a woman affected by dissociative personality disorder. Say Bob creates a Reddit post and wishes to promote it to the top. Instead of going through the trouble of colluding with multiple other real users, he can write a program that creates an arbitrary number of email addresses and corresponding Reddit accounts and make the vote on the post. Voilà.
Douceur argued that it is practically impossible to completely prevent a Sybil attack without a trusted authority verifying the one-on-one correspondence between virtual and real world identities. As argued in our previous blog post, this isn’t a desirable solution: the central authority could decide to blacklist users at will or be coerced or hacked to do so. Such authority would also have to be trusted by every user in the network; finding such an entity is a challenge in itself. Fortunately, it’s still possible to considerably mitigate the ability of a malicious user to corrupt a system via a Sybil attack. It should be good enough to make it so that the Sybil identities are overwhelmed by the legitimate ones.
Stake weight based voting: not such a good idea?
A common cop-out solution for online voting to prevent Sybil is resource testing.
In this setup, a user must prove that they possess a certain amount of some resource, and the amount is taken into account when calculating the weight of their vote.
A popular implementation of this is Proof of Stake wherein members of a network hold tokens that are associated with real-world monetary value. Simply put, we can say that a person holding 4% of the total amount of tokens being “staked” for voting also holds 4% of the voting power.
In this scenario, the adversary can still create Sybil identities, but gains no advantage in doing so. No matter how many identities they create, they still hold only 4% of the tokens, and so their voting weight is the same. But is this solution acceptable?
It might be for a network where the wealth distribution is fair, but fair wealth distribution is hard to achieve, even more so considering the rampant speculation in everything related to virtual tokens. Furthermore, in open economies, wealth seems to naturally tend to concentration in the hands of a few, following a sort of Pareto curve.
The question we ask is this: in the context of a distributed network for information assessment, do we want the rich to decide what is or isn’t trustworthy?
Social networks as filters
What does a social network such as Facebook look like? On the network, each user has a list of “friends.” When two participants are “friends,” it generally means that they know each other or are acquaintances: we can say that there is a “trust” relationship between the two users.
These types of relationships, it turns out, can be fairly represented through graphs. Imagine that each user is a dot — or node on a figure: we represent “friendship” between two users by linking their respective node with a line. The nodes are then the vertices of the graph, and the lines are the edges.
This representation, alongside with some mathematical tools, can be used to mitigate Sybil attacks.
To understand how, let’s imagine that Alice, a malicious node on the network, wishes to influence a Facebook community by creating several Sybil identities. Once the fake profiles are in place, she will friend them to make it look like they are real persons. She might even go a step further and make it so that the fake accounts are all friends with each other.
If we stop here, we can already see in a very visual way that something is off. While the Sybil accounts are densely linked together, none of them is directly linked with the rest of the network. They are only linked to Alice who in turn is connected with the rest of the graph. The edge linking Alice to the cluster of Sybil nodes she created, as well as any connection between the Sybil nodes and legitimate ones is called the attack edge. Now that we know Alice created a set of fake accounts, we can just cut her out of the network and all the Sybil accounts with her.
In reality, it is not as simple as just cutting someone out of the network, but the idea of using this representation was explored by a number of researchers. In particular, a mathematical concept called graph mixing time was used to devise an algorithm that, when it is time to collect votes from a sample of users, generates a pseudorandom route through the graph in such a way that it limits the influence of Sybil nodes. SumUp is an example of such an algorithm. Directly quoting Tran et al.’s fantastic paper, SumUp guarantees that with high probability, the number of bogus votes collected is bounded by the number of attack edges while the number of honest votes collected is high.
However, what if we are not using a social network? More specifically, what if we are talking about a pseudonymous network of peers where “trust” doesn’t bear the same meaning compared to being friends with one another on Facebook? The problem gets a bit harder than it was.
Collusion, reputation systems, and Sybil… again
Not every peer-to-peer system can be easily modeled as a social network. In this section we do not directly consider Sybil attacks. Rather, we restate the problem saying that there’s a fraction of malicious peers in the network and that the goal is for a given request (be it for a vote or to look up a file) to be successful i.e. avoid malicious peers that could collude and redirect requests to one another. The goal is to prevent such collusion.
Collusion is similar to Sybil attacks to an extent. When Alice creates many virtual identities, the ultimate goal is effectively to have them collude to pollute the requests. If a protocol is robust against collusion, it should already limit the ability of a Sybil adversary to use the false identities efficiently.
One of the options would be to use a verifiable random number generator.
Suppose there is a public source of randomness that everyone can agree on. Each time a request is made, the random number could be used to determine the peers that would, for instance, take part in voting. That would prevent collusion because the owners of the malicious nodes would have no control over the random number generation. While this solution seems good for simple queries like checking if a node is alive or sharing bandwidth, it doesn’t help when there is a need to access specific information like a specific file or, in the context of an information assessment system, a group of peers with expertise on a specific topic. Since the source of randomness is public, that would also allow malicious peers to take control over the selected honest peers before they receive the request.
Another way to implement randomness would be to make available a public source of randomness, but one that would not publicly point to the selected users. Rather, users would use this source privately to verify whether they have been selected, which would remove the danger of being corrupted before getting a request, because only them would know they have been selected until they broadcast their answer. This system has been implemented by Algorand for example. In the context of our information verification concept however, it is difficult to implement a similar system that would select peers based on some relevant criteria such as expertise or reputation.
There also exists a number of solutions that organize the network in a relevant and collusion resilient way, one of which is STORM from Ravoaja and Anceaume. STORM is a reputation system robust against collusion. The idea is to let peers rate service providers based on their recent interaction with them. The service providers could be resellers on a website like eBay, or assessors in an information assessment system. STORM uses a Distributed Hash Table inspired architecture: users are organized in rings around the service providers they interacted with. The algorithms used then guarantee that if the fraction of malicious peers in the network is below a certain threshold (dependant on the size of the network), then a request is successful with a high probability.
While this is a good result, the problem of Sybil attacks is not addressed. If an adversary can create identities as they please, what is to stop them from going above that threshold? With this example we see that a robust reputation system is intricately linked to a robust underlying Sybil resistance system.
One commonality of the systems we explored so far is that they are not anonymous. That is, the real world identity of participants in the network as well as their votes are either publicly known, or uniquely linked to a pseudonym (“pseudonymity”). This poses a challenge because users could be coerced to vote a certain way or face retribution. The problem of anonymous and anonymity in general is itself very complex and deserves its own blog post. We simply note here that disappointingly, it has been proved that a graph-based, anonymous and sybil resistant reputation system is mathematically impossible to obtain.
Universal identity system
There is another set of solutions to Sybil attacks taking a completely different path: what if each human had a universal, unique identifier that they would need to prove the ownership of in order to vote ? This would basically be a cryptographic ID card. Again, our problem is to make it so that this identity is not issued centrally, as well as make sure that it is not possible to censor a given participant. The decentralized issuance of a unique identity has been briefly attempted on Ethereum by Humanity DAO that used a system of application for new participants after the registry had been bootstrapped by a trusted set of participants. Another project that comes to mind is Idena which requires that participants take part in regular validation sessions where they are required to solve AI-hard problems in a limited time. The identities are considered “Human” after a threshold of number of validations and success rate is reached. Neither of these solutions is perfect, but they could be a way to help online voting. Combined with secure multi-party computation protocols, it could be possible to know the result of a vote without knowing who each individual participant voted for.
The DeFacto concept and online voting
DeFacto is a concept of peer-to-peer information verification, which implies that there will be some kind of voting at some point in the flow of the system. For example, a user might submit a piece of content to assess to the system and other peers would be prompted to vote on whether said content is trustworthy. This is only a basic view that can be enriched with other ideas, but there will be voting no matter what. We see from our discussion that in order to be resistant to attacks by malicious actors, tremendous technical and game theoretical challenges need to be addressed. At a time when disinformation could be doing more damage than ever, this seems worth attempting.