Thanks for tuning in! This post is part of a broader series on blockchain technology. In prior posts, I’ve provided an overview of blockchain’s history and have explained how protocols, DApps and ICOs work.
Recently, I’ve turned my attention to the issue of blockchain privacy. My last article laid out an argument for why we need privacy coins and discussed how they may evolve in the future. My next few posts will get a bit more technical and break down how the top privacy protocols work, starting with Monero. Also, if you’re enjoying these posts and want future posts sent to your email, please subscribe to my distribution list! Alright, enough of the overview, let’s dive in.
Monero is currently the king of the privacy world. Why? Because its underlying technology has one of the longest track records of success and because the coin touts the largest market cap among privacy coins.
Before diving into the protocol’s mechanics, I’m going to start with a brief history as I think it provides some helpful context. However, if you feel like you have a good handle on the backstory (or just find history boring), feel free to skip ahead!
A Brief History:
Monero’s story begins in 2012 with the development of the CryptoNote protocol. CryptoNote was one of the first privacy-focused blockchain protocols and laid the foundation for a number of future cryptocurrencies. The protocol was established in December 2012 when Nicolas Van Saberhagen published the white paper “CryptoNote 1.0”. Similar to Bitcoin’s notorious founder, Satoshi Nakamoto, Nicolas Van Saberhagen is a pseudonym and, while there is plenty of speculation around his / her / their true identity, nobody has claimed ownership of the paper to date.
CryptoNote 1.0 was a revolutionary white paper for a few reasons. First, it detailed a method for obscuring the identity of both the sender of a transaction (using Ring Signatures) and the recipient of a transaction (using Stealth Addresses). In addition to that, the protocol detailed a new proof-of-work mechanism called “CryptoNight”, which is resistant to the use of Application Specific Integrated Circuits (“ASICs”). If you haven’t heard that term before, an ASIC is an expensive piece of equipment that can be used to mine Bitcoin faster than a traditional CPU or GPU. The use of these on the Bitcoin network has led to the concentration of Bitcoin mining in the hands of a few well-funded operations, which has somewhat centralized the decentralized network. By implementing CryptoNight, the CryptoNote protocol attempted to avoid concentrated mining operations and create a system that is more in line with the one-CPU-one-vote ideal originally proposed by Satoshi Nakamoto in the original Bitcoin whitepaper. A few months later, in October 2013, Nicolas Van Saberhagen published an updated version of the white paper called CryptoNote v 2.0; however, the updated paper did not propose any noteworthy changes.
What’s important to note here is that, while the CryptoNote white paper detailed a groundbreaking new privacy-oriented protocol, it was simply a written document. There was never a CryptoNote network or a CryptoNote coin. That brings us to the next stage of Monero’s history: Bytecoin.
Bytecoin was the first cryptocurrency to actually implement the CrytpoNote protocol. Bytecoin went live in November 2013 when the core Bytecoin team pushed their initial code to Github. A few months later, in March 2014, they pushed the remainder of the code and the protocol was up and running. Seems like a pretty straightforward story right? Not exactly. Unfortunately, like many things in crypto, the development of Bytecoin is shrouded in a bit of mystery and more than a few red flags. Let’s walk through a few of the core concerns:
- All of the original CrytpoNote and Bytecoin researchers were unknown entities. None of them had published any prior research or online commentary, indicating that they all were either operating under pseudonyms or had limited prior experience in the crypto space. Both of those possibilities were red flags.
- The original CryptoNote white paper, which was said to have come out in 2012, made a reference to a post that wasn’t published until 2013. In addition to that, it was later discovered that the white paper’s electronic signature was created in a way that could have been manipulated, drawing into question the true timing of the paper.
- The Bytecoin network didn’t take off until it was “randomly discovered” by two people using the same story on different online forums at the same time, leading some to believe that the “random discovery” was actually just an online post by members of the Bytecoin team.
- Lastly, the most important issue was that over 80% of Bytecoin was pre-mined, meaning that a substantial majority of the cryptocurrency was held by the coin’s developers. As a result, the coin was highly centralized and the lead developers were poised to reap the majority of the profits if it appreciated in value.
These issues have led some people to conclude that a single group of developers created both the CryptoNote protocol and the Bytecoin cryptocurrency simultaneously. They believe that this group of developers deceived the public into believing that the CryptoNote white papers were created in 2012 when, in reality, they were created in 2014. In doing so, they “validated” the underlying CryptoNote concepts, lending legitimacy to the Bytecoin project and enabling them to profit immensely from the 80% pre-mined Bytecoin.
Regardless of whether that theory was accurate, it was clear that, while the underlying CryptoNote technology was sound, the substantial pre-mining was going to create some long-term problems for the currency. As a result, several competing groups decided that they wanted to fork, or relaunch, the coin. The most relevant of these groups (at least for our story) wanted to launch a new coin called “bitMonero”. The group was hoping that bitMonero would fix a number of issues with the underlying Bytecoin protocol, including issues related to block rewards, block times, and emissions. Unfortunately, the group’s leader, a person operating under the pseudonym “thankful_for_today”, was in such a hurry to launch bitMonero that he completely ignored these key issues. He forked the Bytecoin codebase on 4/18/14 and subsequently disappeared. After thankful_for_today disappeared, the bitMonero community members decided it would be best to take over the project. Therefore, just five days after bitMonero’s initial launch, the group forked bitMonero’s code base, dropped the “bit” from the name and launched the new coin as “Monero”. Since then, the Monero team has continued to propose and implement important upgrades to the system. Notably, they incorporated Ring Confidential Transactions in 2016 to obscure transaction sizes and recently implemented bulletproofs to significantly reduce transaction fees. Additionally, Monero is in the process of developing Kovri, which is a technology that will enable users to hide their geographic location and IP address.
Since it’s founding, Monero has done exceptionally well, growing to a market cap of over $7.6 billion at its peak. The crypto bear market has subsequently knocked its market cap down to approximately $716 million as of 12/4/18; however, it still ranks as the 12th largest cryptocurrency according to CoinMarketCap.
Now that we have a good understanding of Monero’s history, let’s discuss how the protocol actually works. While there are a number of features that differentiate the Monero protocol, including variable block sizes, a built-in inflation mechanism, and the ASIC-resistant CryptoNight consensus mechanism, we’re going to focus specifically on its privacy-related features. The Monero protocol provides users with privacy through the implementation of three core cryptographic techniques: (i) stealth addresses, (ii) ring signatures, and (iii) ring confidential transactions. These techniques obscure the transaction’s sender, recipient, and amount, respectively. Let’s break each of them down further.
As we mentioned above, stealth addresses enable the Monero protocol to obscure a transaction recipient’s identity. They do this by requiring the sender to create a random one time public address for every transaction. That transaction can then be accessed by the recipient but cannot be linked to the recipient’s true public address, making it impossible for an outside entity to determine the recipient’s identity. So how exactly do these stealth addresses work? This is about to get a bit technical so, again, feel free to skip ahead if you don’t want the additional detail.
To understand how stealth addresses work, you need to understand a little bit of elliptic curve cryptography (“ECC”). As the name implies, elliptic curve cryptography is a form of public key cryptography that is based on the underlying algebra of elliptic curves. This cryptographic method is considered to be more efficient than traditional methods because it can generate the same level of security with a much smaller key size, effectively reducing a protocol’s storage and transmission requirements.
Monero uses a very specific elliptic curve called the Edwards25519 curve. The points on the curve can be added to or subtracted from other points on the curve. When you add or subtract two points on the same curve, the result is a third point on the curve. Additionally, each point on the curve can be “scaled”, meaning that it can be added to itself x number of times (where “x” is the scalar value). To use an example, if you want to scale point A by a factor of 3, you simply need to calculate A + A + A.
Bear with me, we’re almost there. We just need to quickly cover three properties of elliptic curves that make them particularly useful in the creation of stealth addresses.
- First, there is a point on the curve called “G”, which is known as the “base point”. You can think of G as the 12 on a clock. It is generally the starting point for calculations and is also the point where the curve loops over.
- Second, there is a unique scalar value called “L”, which represents the number that G must be scaled by before the curve loops over. Said another way, if you add G to itself L times, you would make it the whole way around the loop and start back at G. This is similar to a clock going through a full loop and making it back to 12.
- Third, in elliptic curve cryptography, each user has four keys. They have a public spend key, a private spend key, a public view key, and a private view key. The private keys represent scalar values and the public keys represent points on the curve. What’s interesting is that the public keys are actually calculated from the private keys. Public keys are simply equal to the private key (which is a scalar value) multiplied by the base point (G). For example, if your private spend key is 3, then your public spend key would simply be 3G or G+G+G. It’s important to note that, while it is relatively easy to take your private key and derive the public key, it is computationally infeasible to take the public key and derive the private key, particularly for large scalar values.
Congrats! You now have the minimum understanding of ECC to understand stealth addresses. Like we mentioned above, stealth addresses are one time public keys that are created by the sender and are unlinkable to the recipient’s true public key. They are created using an Elliptic Curve Diffie Hellman exchange, which is basically just a method for exchanging a shared secret over an insecure network. In my opinion, the easiest way to understand the stealth address creation process is to walk through an example. So let’s imagine that John wants to send Monero to Sue. In order to do so, John must go through the following steps:
- First, John needs to pick a random scalar value “r”, which is somewhere between 0 and L (the maximum scalar before the curve loops back to G). John can use that random scalar value to generate the associated public key: R = rG.
- Second, John multiplies r by Sue’s public key S to get rS. It’s important to note here that John and Sue are the only ones that can calculate this value. This is a result of the third property discussed above. Because public keys are simply private keys multiplied by the base point G, we know that rS = rsG = rGs = Rs. Therefore, Sue can derive the value by multiplying her secret key s by the randomly generated public key R and John can derive the value by multiplying the randomly generated secret key r by Sue’s public key S. Nobody else can calculate this value because they cannot access either secret key.
- Third, John then takes the hash of rS to create a brand new scalar value. This is the part of the process that makes the transaction unlinkable.
- Fourth, after deriving this new scalar value, John multiplies it by the base point G to generate a new point on the curve called F. Using mathematical notation, F = H(rS)*G.
- Fifth, John can finally create the stealth address (let’s call it “P”) by adding F to Sue’s public key. Again, if we use mathematical notation this looks like: P = F + S, where S represents Sue’s public key.
By creating the public key in this manner, Sue is able to retain her anonymity. In addition to that, she’s still able to identify the transactions that were sent to her public key. How does she do that? She’s able to identify transactions sent to her public key by constantly scanning the network and essentially re-engineering John’s process for each outstanding transaction. In practice, that means that Sue’s wallet will scan the network, find an outstanding transaction, take that transaction’s public key R, multiply it by her private key s, hash it, add the hash output to her public key S, and check to see if the resulting stealth address matches the transaction’s. If it does, she knows that the transaction was meant for her. If it doesn’t, her wallet moves on and checks the next transaction.
Okay great, now we know how to create a one time public address and how Sue can identify the transactions that were sent to her. But how can Sue actually spend those transaction outputs? To spend the transaction outputs, Sue needs to calculate the transaction’s one time private key. What’s cool is that she can actually derive the one time private key using both the one time public key, R, and her own private key, s. To derive the one time private key (let’s call it “x”), she simply has to add her private key to the hash of Rs (the output of step three above). Using mathematical notation, this looks like: x = s + H(Rs). She can then use that private key to sign a new transaction with a ring signature, which we’ll cover in greater detail in our next section.
Hopefully that clarifies how stealth addresses work! If you’re still confused by ECC or stealth addresses, I highly recommend reading this article. That’s what really made it click for me. Alright, let’s move on and discuss our next cryptographic technique: ring signatures.
Stealth addresses are great for obscuring the recipient’s identity but they only get us part of the way there. We need to be able to obscure the sender’s identity as well, which is where ring signatures come in. Ring signatures obscure the sender’s identity by randomly selecting transaction outputs from the network, taking the public keys from those transactions, and mixing them in with the sender’s one time public key. In doing so, ring signatures make it impossible for an outsider to determine which of the public keys actually sent the transaction. In the eyes of a third party, each public key has an equiprobable chance of being the true sender.
This presents us with an interesting challenge. If we completely obscure the sender’s identity, how can we be sure that the Monero hasn’t already been spent? Said another way, how do you solve the double spend problem if you don’t know which public key initiated the transaction? Fortunately, Monero has solved this potential issue by using something known as a key image. Essentially, there is a key image associated with each transaction. The key image is calculated by multiplying the one time private key, x, by the hash of the one time public key. In mathematical notation, this looks like: I = x * H(P). By calculating a key image in this manner, it is infeasible for a third party to determine which private or public key was used to create the key image, maintaining the sender’s anonymity. At the same time, it is only possible to generate one key image for each one time private / public key combination, eliminating the potential of a double spend.
Okay, so now we have a basic understanding of what a ring signature is and how it mitigates the double spend issue through the use of key images. But how do we actually create one? How does Sue use the one time private key to spend the output that John sent her? Ring signatures are created through something known as a “non-interactive zero knowledge proof” or “NIZKP” for short. That term probably sounds a bit opaque / complicated so please join me on a quick zero knowledge proof tangent.
A zero knowledge proof (or “ZKP”) is basically a way of proving that you know something without actually revealing what you know. To use a popular illustrative example, imagine that you have two balls that are the exact same shape and size but are two different colors. Now, imagine that you need to prove to a blind friend of yours that the balls are different even though they feel exactly the same. How would you do it? One way to do so would be to have him put the two balls behind his back, mix them up and present one to you. If you can correctly tell him which ball is which then you demonstrate that there is likely a difference between the two. If you can repeat this process many times without a mistake, you can prove beyond a reasonable doubt that they are two different colors. In fact, you can prove that they are two different colors without ever revealing what the two colors are. This is an example of an interactive proof. With an interactive proof, there is a prover and a verifier. The verifier asks the prover a bunch of questions (called “challenges”) to prove beyond a reasonable doubt that they know what they say they know.
In cryptography, you can use ZNKs to prove that somebody knows a private key without actually revealing the private key. These proofs typically follow the following pattern. First, the prover generates a random private key. Second, the prover uses that random private key to generate an associated public key. Third, the verifier issues a challenge in the form of a random value. Fourth, the prover calculates a new private key using the original private key and the challenge value as inputs. Fifth, the verifier can validate that the associated public key is accurate and could only be calculated if the prover has the private key.
That description is a little bit abstract so let’s go through a basic example. Keep in mind that this is not the ZKP that Monero uses. Alright, imagine that Sue has a one time private key x and an associated one time public key P. She wants to prove that she knows x without revealing the true value. To do so, she creates a random private key, let’s call it q. She multiplies q by G to get the associated public key Q, which she then sends to Bob. Bob sends back a challenge, which is another random scalar value called c. Sue can then do the following calculation to derive a new private key: s = x*c + q. Aka, she can transform x using c and q. This allows her to prove that she knows x without revealing x. She then sends the new private key s back to Bob. Bob can then prove that Sue knows the private key x because he can check that the new public key S (which he generates using s) is equal to P*c + Q. Said another way, even though he does not know x or q, he knows that if S = Pc+Q, then Sue must know x because of the following identity:
Pc + Q= x*G*c + q*G = (x*c)*G + q*G = (x*c + q)*G = s*G = S
By issuing multiple challenges, he can prove beyond a reasonable doubt that Sue knows x. This is an interactive zero knowledge proof because Sue and Bob sent multiple challenges / responses back and forth. But what if Bob isn’t around and Sue still wants to prove that she knows x? In this case, she can use a non-interactive zero knowledge proof. An NIZKP is the same as an interactive proof, except that it replaces Bob with a hash function. So rather than Bob randomly choosing the challenge, Sue can take the hash of A and use that as the challenge c. Also, if Sue wants to include some message in the proof, she can concatenate that with A before taking the hash. This allows Sue to prove the exact same thing without requiring a verifier. She can also send that proof to anybody that asks.
Okay great, now we know how ZNKs work but what the heck does it have to do with a ring signature? It surprisingly has a lot to do with ring signatures because ring signatures use a form of NIZKP to create and validate the ring. Basically, if Sue wants to send the Monero she received from John, she needs to go through the following steps:
- First, she needs to collect a bunch of inputs that will be used in the NIZKP. These include the key image, the one time private key x, the one time public key P, and a number of public keys associated with random transaction outputs on the network. As we mentioned before, these random public keys will serve as decoys. By including these as inputs in the transaction, each public key has an equiprobable chance of being the true public key, making it impossible for a third party to determine who is actually initiating the transaction.
- Second, she chooses two random values (q and w) for each public key in the ring. She then takes those random values and uses them to transform the public keys by running them through a couple of relatively complicated equations. You can check out those equations in the white paper if you’re interested! After transforming each of those public keys, she’s left with a new string of values.
- Third, she takes that string of values and hashes it to create the non-interactive challenge c.
- Fourth, similar to our initial example, she takes that challenge and uses it to calculate two new strings (c and r). She then combines those two strings and includes the key image to create the ring signature. The ring signature output is in the following form: σ = (I, c1, . . . , cn, r1, . . . , rn). Again, feel free to check the equations used to calculate this string in the initial white paper.
- Finally, after calculating this ring signature, a verifier can use the c and r values from step 4 to confirm the output generated in step 2. If the output matches up, then the ring signature is valid. If not, the verifier can reject the ring signature.
This may have been way more detail than you wanted but the main takeaways are that (i) ring signatures obscure the sender’s identity by lumping his / her public key in with a bunch of random keys taken from the blockchain, (ii) they do this through the implementation of a non-interactive zero knowledge proof, and (iii) they generate unique key images for each transaction, which prevent double spending. With that, let’s move on to the third and final privacy technique used in the Monero protocol: ring confidential transactions.
Ring Confidential Transactions
Ring confidential transactions (or “Ring CT” for short) are the final piece of the privacy puzzle for Monero. Ring CT enables the Monero protocol to obscure transaction amounts. The technique was originally created by Bitcoin core developer Gregory Maxwell and was implemented by Monero soon after.
So why do we need to obscure transaction amounts if we already obscure the transaction participants? It’s important to obscure transaction amounts because a malicious third party can actually use transaction amounts to piece together a full transaction history. For instance, in our original example, if John sent Sue 4.32 Monero and Sue then used those 4.32 Monero to pay Bob, somebody could actually follow the commitment trail and link Sue to her previous transactions. This is particularly true when users send or receive uncommon transaction amounts. Prior to the implementation of Ring CT, Monero solved this issue by breaking transactions up into common chunks. For instance, it would break the 4.32 Monero up into 4.00, 0.30, and 0.02 XMR. It would then use prior transactions with the exact same amounts as mix-ins for the ring signature. That worked pretty well but there were a few key issues with the approach. Notably, it created a lot of “dust” on the blockchain. By dust, I’m referring to small transaction amounts like the 0.02 in our example above, which take up more space on the blockchain than they are actually worth. Additionally, by requiring the sender to mix these inputs in with inputs of the exact same size, it significantly reduced the pool of potential mix-ins, making it challenging to construct a ring of an appropriate size in some cases. Luckily, Ring CT solved these issues by obscuring transaction amounts. But how does it actually do that?
It does so through the use of Pederson commitments. Pederson commitments essentially encrypt a transaction so that only the sender and the recipient know the true transaction amount. At the same time, other network participants are able to prove that the sum of the inputs is equal to the sum of the outputs, ensuring that coins were not created out of thin air. Let’s illustrate through an overly simplified example. Imagine that you have three inputs for a transaction: 1, 3, and 5. You also have three outputs from a transaction: 1,2, and 6. You know that the sum of the inputs are equal to the sum of the outputs because 1+3+5 = 9 = 1+2+6. In order to shield the input and output amounts while still allowing a verifier to confirm that the outputs equal the inputs, you can transform both sides of the equation by some value (let’s call it A).
This would look like: 1A + 3A + 5A = 1A + 2A + 6A.
Which is the same as saying: A(1+3+5) = A(1+2+6)
Monero’s implementation is a bit more complicated but this gives you an idea of how it works. One interesting point here is that, by this definition, somebody could potentially use a negative output to create Monero out of thin air. For instance, you could say that 1+3+5=-100+109. To eliminate that possibility, Ring CT uses range proofs to make sure that users can’t game the system by using negative inputs or outputs.
Pros and Cons
Clearly, Monero has implemented some innovative techniques for obscuring transaction identities. So how does it stack up to other privacy-oriented cryptocurrencies? Let’s quickly walk through a couple of its pros and cons.
The top benefit of Monero is that it is private by default. This means that if you want to send a transaction on the Monero blockchain, you have to do so anonymously. This differs from other privacy coins like ZCash that gives users the option to send a transaction with or without the privacy features. Why does that matter? Surprisingly, if you don’t require privacy for all users, then somebody could use advanced blockchain analysis techniques to compromise the anonymity of the users who have opted into the privacy features. Additionally, it has been shown that a very small percentage of users actually opt into the privacy features when offered the option. This limits the size of the shielded pool, further reducing the network’s privacy benefits. If you want to read more, there was a paper written by researchers at the University of London on the issue, which you can find here. They conclude that:
“… our study has shown that most users are not taking advantage of the main privacy feature of Zcash at all. Furthermore, the participants who do engage with the shielded pool do so in a way that is identifiable, which has the effect of significantly eroding the anonymity of other users by shrinking the overall anonymity set.”
Another pro is that Monero is more decentralized than other privacy protocols on several different fronts. First, Monero does not rely on trusted parties like coin mixing services or trusted nodes (like Dash); rather, it incorporates privacy techniques directly into its protocol. Second, there is no central entity making decisions on behalf of the network. While the core team is responsible for maintaining the code base and guiding the protocol in the right direction, it does not have the final say over decision-making. This directly contrasts ZCash, where the ZCash company has significant control over the development of the protocol. Third, Monero’s CryptoNight consensus mechanism has enabled it to remain resistant to concentrated mining operations. Cryptocurrencies that have not updated their consensus mechanism, such as ZCash, may run into mining concentration issues.
So are there any issues with Monero? Yep. The biggest issue confronting Monero is scalability. While this is an issue for all cryptocurrencies, it is particularly problematic for Monero. This is because all of the decoys that are used in Monero’s ring signatures need to be included in the transaction, which meaningfully increases the average transaction size. Luckily, Monero recently implemented bulletproofs, which have significantly reduced transaction sizes; however, the average Monero transaction still far exceeds the average Bitcoin transaction. Additionally, it’s likely that Monero will implement new privacy features in the future (such as increased ring sizes) that will further bloat transaction sizes. This is an issue that the Monero community is acutely aware of so I’d be surprised if it didn’t ultimately get remedied; however, there are not any near-term solutions on the horizon.
That wraps up my overview of Monero, I hope you found it interesting! Stay tuned for my next post, which will break down ZCash and Dash. If you enjoyed this post and would like future posts sent directly to your email, please subscribe to my distribution list or reach out to me at email@example.com.
If you have an interest in venture capital and want to read more VC-related content, please follow my publication “All Things Venture Capital” on twitter. Also, let me know if you’re interested in adding to the publication! My goal is to continue to add high quality content (articles, podcasts, videos, etc.) from aspiring and current venture capitalists that want to share their perspective. Thanks for reading!