Monero Part 1: Key Concepts
This article is an overview on key concepts in Monero. It assumes basic knowledge of blockchain properties and some other nerd stuff. Here’s an article I wrote on Bitcoin that might help. Also, here is a part 2 which is all about how Monero works; you can read that and use this one as a reference if you’d like.
Elliptic Curve Cryptography (ECC) Primer
Privacy is Monero’s “thing,” and to understand Monero it’s very important to understand how it achieves this privacy. To that end, here’s a short primer on ECC so you can be ready for the other concepts to come.
In short, ECC is an approach to public key cryptography. It’s main benefit is that keys are very small compared to, say, RSA.
ECC uses elliptic curves over finite fields. “Finite field” just means that values are, well, finite. Think modular arithmetic; basically values on the curve are modulo’d after calculation so that they loop back over the same finite set of values rather than approaching + or - infinity.
If you need a little primer on modular arithmetic, use clocks as a helpful analogy. What do you get if you’re at 23:00 hours and you add 1 hour? Right, 0. Rather than continually increasing, the value “loops back” when you add past 23. Therefore you can say that hours are defined over the “finite range” of integers 0 to 23. In the same way, Monero’s elliptic curves are defined over a finite field of values.
Ok, here’s an example of an elliptic curve:
I know this doesn’t resemble an ellipse. That’s not important. What is important about this elliptic curve are these 2 properties of its points:
- They can be added/subtracted with each other. That is, if you add/subtract 2 points on the curve, the sum/difference will be a valid point on the curve.
- They cannot be multiplied/divided with each other.
Keep in mind that this is all referring to finite field arithmetic, which is different from the regular stuff they teach in grade school.
As a corollary to point 1, points can also be added to themselves, which allows for scalar multiplication (meaning, unsurprisingly, that you can multiply points by scalars to get valid points on the curve).
Ok. Understanding that, here are more definitions to know:
- Base Point (G): This is the point from which all the other points are generated. By “scalar multiplying” G over and over again, all the other points can be generated. G is also known as the “generator point”
- Order (l): The order of the base point is the maximum number of points we can use in the curve. That is, f you “scalar multiplied” G l - 1 times, you’d wind up back at G.
Note: that’s probably all the ECC you need to understand so that you don’t get completely lost as you continue to read about Monero. I’m going to keep going, though.
Now let’s go deeper. If you look closely at the image above, you’ll notice that our graph is symmetrical on the horizontal line y=20. This horizontal symmetry is a property of all (cryptographic) elliptic curves.
And if you look even, even closer, you still definitely won’t notice the next important property of this curve (and all curves like it): that any non-vertical line intersects at most 3 points on the curve.
Knowing these two things, we can define an operation “dot” for this curve where, for a non-vertical line that goes through 2 points A and B, A dot B is equal to the reflection of the third point the line intersects. Here’s a gif to help:
We’d say here that A dot B = C. Note that A and B do not have to be the first 2 points intersected. That’s what enabled this next step, C dot A = D
You can keep doing this over and over to achieve the “one-way-ness” necessary of many cryptographic algorithms (these one-way functions are called “trap door functions”). That is, if you just keep dotting A with these new points over and over, observers will not be able to deduce A from whatever end result value you have.
And here is dotting illustrated over our finite curve:
Pretty gnarly, huh?
So to summarize, ECC involves:
- Choosing a nice elliptic curve with base point G. This can all be public.
- Choosing a private key S––just a scalar value––and generating the public key P by dotting the G S times.
And that’s ECC. The operation in step (2) above is a trapdoor function: easy to compute the public key from the private, but very difficult to compute the private from the public.
Diffie-Hellman-Merkle Key Exchange
The next cryptographic algorithm I’m going to discuss before we get to fancier Monero stuff is the widely used Diffie-Hellman-Merkle (DHM) key exchange protocol, which enables 2 users to create a shared secret key (for symmetric cryptography) over a public channel. It is what enables stealth addresses, which Monero uses to hide the receivers of transactions.
Here is a picture of DHM:
Essentially, they can combine private data (the red-orange and light-sea-green paint) with public data (the yellow paint) to create color combinations. For Alice this results in a creamy orangish paint somewhere between peach and butterscotch, and for Bob this results in a nice baby blue. By sharing these mixed paints publicly, and then mixing them with their secret paints, they can create the same private paint: a wonderfully gross brown.
Monero actually uses a new-age ECC version of this called Elliptic Curve Diffie-Hellman. All this does is change the way that the numbers are chosen, so you don’t have to worry about it.
Stealth Addressing for Unlinkability
Ok. Now that those are out of the way, let’s get to Monero.
To achieve unlinkability––which means to anonymize the receivers of transactions––Monero makes use of stealth addresses. A stealth address will generate a unique address every time it is used so that the actual address of the receiver is hidden. This is achieved by having every account use 2 key pairs rather than one.
See, with only one key pair (x, P), funds belonging to public key P can be sent by using private key x to sign the transaction. This digital signature proves that the sender knows private key x and therefore owns the funds associated with P, but it is not very private. Everybody knows that the funds are associated with P.
Now let’s look at stealth addresses. Take 2 key pairs (a, A) and (b, B), where public keys A and B are derived from the private keys a and b using A = aG and B = bG (where G is the ECC base point). The pair of public keys (A, B) will be our stealth address. Let’s call the owner of this stealth address Todd.
If Jared wants to send funds to Todd, a one-time public receipt address P can be generated with the equation P = H(r*A) * G + B where “H” is a one-way hash function and “r” is a secret random number generated by Jared. This transaction will be broadcasted along with the value R, where R = rG.
Then, in order for Todd to retrieve the money associated with one-time public key P, he first uses R (along with his 2 private keys) to generate the associated one-time private key x using the equation x = H(a * R) + b. He then attempts to use this private key to derive the public key P:
P = x * G (if this is true, then the funds at P belong to Todd)
Let’s look at the math under the hook here. Substituting x:
P = (H(a * R) + b) * G
P = H(a * R)G + bG
Substituting rG for R and B for bG:
P = H(a * rG)G + B
An finally, substituting A for aG:
P = H(Ar)G + B
Which, if A, B, and r match up, should be equal to P. Therefore the coins are Todd’s.
You may recognize this math if you’re familiar with the DHM key exchange. This is because the DHM key exchange is exactly what’s happening here: the sender computes a shared secret Ar using the first half of the stealth address (i.e. “A”) and his own private value r, and then combines that shared secret with the second half of the address (i.e. “B”) to create a one-time public address P. If a prospective receiver can use combine his private key a and the public value R to produce the shared secret Ar, the funds are his.
I have edited the DHM diagram from above to illustrate this:
Ring Signatures for Untraceability
Monero uses ring signatures to achieve untraceability: anonymizing transaction senders. A ring signature is a type of digital signature that is performed by a member of a group of users (or rather, a group of keys). By grouping keys together to sign, the identity of the sender of a transaction is hidden; to any onlookers the true signer could equiprobably be any member of the group (aka the “ring”).
In other words, the ring signature is checked against a set of public keys rather than a single public key. Until the true owner creates a second signature with the same key pair, outside observers cannot determine which public key the signature belongs to. Here is a neat diagram to help:
Another concept in ring signatures is the ambiguity degree or the mixin level; this is just how many foreign keys are used to sign. For example, if the ambiguity degree n is 1, that means one other signature is used, and therefore the probability of an outside observer correctly determining the sender is 50%. For n = 99, this probability becomes 1%.
Additionally, I should mention traceable ring signatures, which have the added property of not allowing the owner of a coin to sign 2 different ring signatures with the same key. This prevents double spending and is very handy on blockchains.
Monero ring signatures are actually a little more complicated than what’s described in the previous section, because they also allow the transaction value to be masked. Before we get into that, though, we need to make sure we understand commitment schemes.
Here is Wikipedia’s definition, which is actually nice and clear:
A commitment scheme is a cryptographic primitive that allows one to commit to a chosen value (or chosen statement) while keeping it hidden to others, with the ability to reveal the committed value later.
The only bit of jargon in that definition is “cryptographic primitive”, which just means a low-level cryptographic function used as a basic building block for cryptographic systems. Like a hash function or something.
Other than that, commitment schemes are (conceptually) just as easy as that; they allow people to “commit” values which remain private (and do not change in value) until the committer decides to “reveal” it. They have many useful applications, including zero-knowledge proofs and secure voting systems––which is why you’ll find use cases for them in every cryptocurrency––but for our purposes we’re just going to look at one: hiding transaction amounts.
To be more specific, Monero uses a commitment scheme known as a Pedersen Commitment. In addition to hiding transaction input and output values (i.e. “committing” but not “revealing”), using them has the added property of allowing observers to verify that the sum of the input commitments is equal to the sum of the output commitments, without revealing the value of these sums. In this way, Monero users can still be sure that no Monero is being created out of thin air.
Multilayered Linkable Spontaneous Anonymous Group Signatures
Ring signatures on Monero are actually a little more complicated than I let on earlier. Instead of the basic ring signature I described in the previous section, Monero uses an improved version called multilayered linkable spontaneous anonymous group signatures, conveniently abbreviated as MLSAGs. They were introduced along with the Ring CT protocol, which I will get to later.
The key difference between MLSAGs and the other ring signatures is in the “M”: multilayered. Rather than a ring signature of n keys, MLSAGs use a ring signature on a set of n key-vectors. Therefore it’s got 2 “layers” or dimensions, so we call it multilayered.
The term linkable may lead to some confusion––after all, I did state previously that one of the key properties of Monero is that it is unlinkable. Different concept. In the context of MLSAGs, “linkable” takes on a different meaning; it no longer has anything to do with transaction receivers, and instead is about the sender. A linkable ring signature just means that observers can determine if 2 signatures signed by the same ring were signed by the same member of that ring (without being able to determine the identity of the signer).
To round out our understanding of what the letters in MLSAG mean, I should also explain spontaneous. Spontaneity in ring signatures refers to “spontaneous mixing” where the members of a given ring can be obtained in an at will and ad hoc way.
MLSAGs aren’t really that important to understand in depth; as long as you have a good idea of what ring signatures are, you can understand Monero. With that said, it’s still good to know what they are: an improved version of ring signatures that enables transaction amount hiding in Ring CT.
Ring CT: Hiding Transaction Amounts
In addition to masking transaction senders and receivers, the transaction amounts are also hidden in Monero through the use of “Ring CT,” which stands for “Ring Confidential Transactions.” This is important for a few reasons:
- Without hidden transaction amounts, the blockchain can be analyzed based on amounts sent to reveal more information than we’d like; in some cases attacker might be able to keep track of which keys go together
- Forming a ring signature for a transaction with input amount A would require the sender to find other public keys with that same amount A. For uncommon amounts this may be difficult, and the potential anonymity may be smaller than desired.
Both of these issues are resolved through hiding transaction amounts; reason 1 because, well, amounts are hidden and reason 2 because amounts can be manipulated prior to signing (after all, that’s how you hide them).
I’m not going to get much into how it works at a low level because, well, that’s too involved for this article. At a high level though, it’s commitments galore; commitments are used for transaction inputs and transaction outputs, and to make sure no Monero is created out of thin air make sure that ∑ input commitments - ∑ output commitments is equal to a commitment to zero.
RingCT has 3 transaction types:
Null transactions are the transactions that give block rewards to miners (you may know them by their more commonly used name, “coinbase transactions”). They’re called “null” transactions because they have no inputs.
Simple transactions are used when a transaction has multiple inputs; each input is signed with an LSAG (an MLSAG without the “M”)
Full transactions are used when a transactions have a single input. An MLSAG is produced.
Project Kovri is yet another privacy feature offered by Monero (well, to be offered; it’s still in alpha). It will make it so Monero’s users will not be able to be identified as Monero users by looking at their internet traffic.
Here is a description of Project Kovri from the official website:
Currently based on I2P’s open specifications, Kovri uses both garlic encryption and garlic routing to create a private, protected overlay-network across the internet. This overlay-network provides users with the ability to effectively hide their geographical location and internet IP address. Essentially, Kovri covers an application’s internet traffic to make it anonymous within the network.
There are a few things to define here:
- I2P (short for the Invisible Internet Project) is an open source network-layer protocol that allows for end-to-end encrypted, peer to peer, censorship-resistant communication. An “invisible internet,” so to speak. In very much the same way as Tor, traffic goes through a network of volunteers.
- Onion routing is where messages are encrypted with several layers of encryption (like layers of an onion). Routers known as onion routers each peel off a layer to uncover the data’s next destination. Thus, each node only knows the identities of the nodes immediately preceding/following it & the IP addresses of peers are kept hidden from one another.
- Garlic routing is a variant of onion routing that encrypts multiple messages together for extra security. It’s name comes from the messages being analogous to cloves on a bulb of garlic. It is used in I2P.
Basically, what Kovri will do is make every Monero node a “Kovri router” that can be used to access the I2P network. Monero traffic & activity will be able to move through that network rather than the regular ol’ internet.
All these privacy features are fine and dandy, but they don’t handle one of the fundamental issues faced by cryptocurrencies: how to prevent double-spending. That’s where key images come in.
Key images prove that the one-time private key x used to sign and send funds at address P has not been used before, without revealing P (or x, obviously). Key image I is calculated with the following one-way function, so that nobody can recover the public key and identify the signer:
I = x * H(P)
Recall also that P = xG; therefore I is a function of x (and, in turn, every unique secret key x has the same unique key image I). By checking a given key image against a set of already-used key images, Monero users can verify that a double-spend is not being attempted.
CryptoNote is the application layer protocol that powers Monero, along with a number of other privacy-oriented cryptocurrencies. While Monero was based on the CryptoNote protocol, it has diverged in a number of ways over the past few years (for example, Ring CT). It’s good to know what CryptoNote is, though, and to not confuse it with CryptoNight, which is the proof-of-work algorithm used by Monero and CryptoNote.
That’s it for key concepts in Monero. Hopefully that’s enough for you to join a conversation or read my part 2.