Provably Anonymous Data Aggregation on the Blockchain

Published in

Consensus AI

6 min readSep 24, 2020

After many months of painstaking work, the Consensus team is proud to announce our next upgrade to the Sentient Blockchain. This version of Sentient will be using the Discrete Logarithm Problem (DLP) to provide provably anonymous data aggregation, as well digital identity management. For months we have been testing this on a new test network dubbed “Appnet” which has already been providing us with aggregation services for the Consensus App, and we believe it is now time to merge this onto the “Mainnet”, and make it open source, so it can be vetted by the community. It’s been a long journey and we’re very excited for the update, as well as finally opening up the source code for the project.

Upgrade Phases:

Users upgrade their Sentient Hub
Consensus submits the DLP Version Proposal
Sentient Blockchain upgrades after a sufficient proportion of new blocks support the version proposal
Consensus App migrates from Appnet test network to Mainnet SEN blockchain.

We’ll go into the mathy details a bit further in the article, but for now here are some reasons why this update is so important.

We can pay out users directly on the blockchain for participating

When creating topics for which users submit “datums” (of which votes are currently the primary use case), there are some special new parameters, including for example “datumReward”, which automatically triggers a transaction rewarding SEN to a user’s wallet when they vote. This provides our users with an incentive to participate, and allows those who wish to collect sentiment data to choose how much they want to reward users for voting. For voters, this is as simple as creating a wallet (or importing from a pre-existing seed) on the Consensus mobile app. For topic creators, this can be done through the Sentient Hub so long as they have enough SEN funds.

We can now make Consensus polls publicly auditable

This claim deserves an entire article on its own, and we will provide a detailed explanation once the version proposal is accepted by the majority of miners. In simple terms, when proof of the aggregation is mined onto the blockchain, any user can locally check that the datums aggregate to the same value provided by the aggregation service. Note, this does not allow anyone to deduce any other user’s vote, but makes it highly mathematically improbable that results were tampered with.

Each user has a digital identity on the blockchain

Not only can someone have a wallet containing funds, but now a hash identity can also be associated with a user. Through the app, this identity can be associated with groups. For now these groups are geographical locations, such as country, state/province, or city, but hypothetically this can be extended to any form of group of arbitrary size. This opens up the potential for decentralized autonomous organizations (DAOs) to vote or collect data on their own terms, as well as enforce restrictions on polls based on membership in a given group or set of groups.

The Algorithm

To develop the secure aggregation technique, we worked with a small research team of experts, who specialized it from the system that we successfully piloted in South Burlington, Vermont last year. However, as it is still novel/new, we invite those of all academic backgrounds to examine and critique this approach. If you believe you have found specific flaws with this, or have further questions, feel free to leave a comment or email me at victor@consensus.ai.

Here are some prerequisites:

Bitwise encoding

Modular arithmetic

Discrete logarithm problem (DLP)

Encryption relies on the computational difficulty of computing discrete logarithms for suitably-chosen generators (g) and moduli (n). Votes are encoded in the exponent and aggregated by multiplying encrypted values.

More formally, to start data collection, a user proposes a topic T, which consists of:

- The modulus n and generator g as described above.

- A = 2^b, the target aggregation size (using b bits)

- An integer M such that MA < |G|, the encoding maximum

- H = {c1 . . . cj : ci = A^(i−1)}, a set of values representing the j possible choices.

The modulus base for the vote is determined by μ = A. The parameter M should be near |G|/A. For this to work properly, we require M ≫ Aμ; in practice this will generally be satisfied.

Network participants Ui are interested in submitting their data. They start by creating private data:

Di =(hi, αi)

where:

- hi ∈ H is their vote

- αi is analogous to their private key, a randomly chosen element of the set

{α∈N: α≡μ hi, α<M, α̸=hi}

Note that αi can be chosen by first choosing a random integer N < M/μ, and then computing αi = Nμ+hi. The set of possible N will be very large as M is much larger than Aμ. The requirement that α be smaller than M will ensure that summing fewer than A different αi’s will not result in something larger than AM = |G|; i.e., the bits will not wrap around.

Each participant then publishes (submits via the blockchain API) their encrypted public data:

Pi =(vi, si)

where:

- vi = g^(αi) mod n

- si = Sig(Ui) (vi) is a signature (optional)

Note that the difficulty of computing discrete logarithms prevents ai from being determined from vi here.

The vote submission and aggregation algorithm performed by the API and blockchain can be broken down into the following steps:

1. Encrypt

- Receive an end user choice for a topic and their encryption of it as described above (all operations over a secure channel)

- Check that it has been encrypted appropriately (based on the topic numbers described above)

- Return an encrypted datum candidate to the user

2. Audit

- Take an encrypted datum candidate from the user

- Return a decryption proof demonstrating that the encrypted value matches the user’s choice

- Note that steps 1 and 2 can be repeated an arbitrary number of times by a user if they desire to check the integrity of the encryption service (see Helios Project for inspiration)

3. Seal

- Take an encrypted datum candidate from the user

- Mark that datum as “sealed” — can no longer be audited, can only be aggregated

4. Aggregate

- Take a list of sealed, encrypted datums from multiple users

- Return the sum of the aggregated choices and a proof that it is correct: the exponent from the product of the encrypted values

5. Record

- Mine the aggregated proof onto the blockchain so that it can be publicly verified

Limitations

The main limitation with this approach is that we still need a trusted entity to verify identities and create the aggregation proofs. In our case this is Consensus, which will manage the aggregation API as well as verify that a user has a unique phone number and is in a certain geographic location. For more sensitive polls this verification can be more strict but we still rely on at least one trusted party to verify identities. On the voting side, it is also important to note that the way our aggregation algorithm works, multiple choice polling is inherently insecure, so we can only allow a single choice to be selected per user. This can be circumvented by increasing the number of choices to allow another choice to be a combination of other choices (for example choice D = A and B).

Finally, it should be noted that we have determined this method should not (yet) be used as a proof-of-work to secure the blockchain, as we had originally described in a white paper and piloted, due to the need to use something like RSA in order to protect against dishonest miners — we are still working on refining that aspect. If there are any other flaws or limitations, this article is an invitation for those specializing in cryptography to submit their thoughts and suggestions.

Conclusion and Future Plans

Coming up, we will post an article on how exactly auditing topics works, as well as detailed instructions for creating topics. With this in place we can look towards more sophisticated data aggregation use cases, as well as more functionality for digital identity management.