How to prevent key collisions in Hyperledger Fabric chaincode

Recently I have a talk about Hyperledger Fabric (HF) at IBM Think 2018 conference and one of the most asked question in later discussions was “what is key collision and MVCC”.

In this short article I will try to explain, with no technical terms, what MVCC is, why is important, what problems it solves and what problems arrive from it, and will give you couple of designs to prevent this collisions.

So, what is key collision? It is very simple to understand — two or more processes try to update same key (or value) at the same time. This problem is not specific for HF, this is general computer science problem and for that reason there are many ways to “fix it”. But here I will explain it in context of HF. Before that lets revisit the life cycle of a transaction:

  1. SDK send transaction proposal to one or more peers
  2. Peer simulate the transaction using the chaincode, current state of the ledger and provided parameters from the request
  3. Peer return the result of the simulation to SDK. Result is set of keys/values which have been updated during the simulation.
  4. SDK collect this sets (can be more than one) , sign them with the private key and send them to ordering service.
  5. Ordering service validate signature and sets and put them in queue.
  6. When queue is full, or predefined time pass, order take this sets, pack them in block, sign the block and send the block to peers
  7. Peers validate signatures and most importantly check that the state of the ledger is same as the state when simulation was executed.
  8. If validation pass this transaction update the ledger
  9. If validation fails, transaction is stored in blockchain, but ledger is not updated.

In this explanation a lot of details are missing, but they are not so important for now. You have to understand that time between SDK send first request and the time when this transaction is actually stored in the blockchain is non deterministic, can take couple of seconds and that most of this steps are executed in parallel. So what is the problem here? Lets take simple example, you have chaincode that transfer money from Bob to Alice, chaincode must verify that Bob have enough funds before the transfer, if not, transaction must be canceled. At the moment Bob have 10 tokens and put request to transfer 10 of them to Alice. Chaincode will verify that this amount is correct (during the simulation) and this transaction will be send to orderer for commit. But imagine what will happen if , at the same time, Bob send another transaction to Eve, again for 10 tokens? Chaincode will verify that he has necessary amount, because transfer to Alice is verified (in simulation) but not committed. Result will be that Bob is sending 20 tokens while he is having only 10. This is classical “double spending problem”. For that reason HF implement MCVV — a well known and proven mechanism to prevent this type of situations. I will not go in details how this is achieved on technical level, there a lot of information out there, if you are interested, but you have to understand that MVCC prevent double spending problem, even if you write the worst, buggy, almost non working chaincode. In general HF will not allow you create this type of situation — conflicting transactions will be rejected and SDK will be informed about the rejection.

One side note — if you are using CouchDB you can create double-spending. This is problem of CuchDB, not in HF.

So what is the problem with this architecture, it seems reasonable to have this type of protection? It is very simple, if you try to update same key at same time all transaction, except one, will be rejected. Imagine that you have a IoT sensor that sends data 10 times per second and the average time of creating new block (to commit the data) is 1 second. 9 of 10 transactions will be rejected, because they will conflict. OK, IoT sensor that sends data 10 times per second is not so common, but imagine same scenario with money transfer. It will be very bad user experience if 9 of 10 transaction fails. And lets not forget that HF can operate with more than 210 000 transactions per minute. When you design your architecture, data structures and flows, you must take into account MVCC capabilities of HF, and techniques that can be used are determined by the data types and flow. There is no “golden rule” or “one fit them all”.

Some of you may be wondering — why this is happening, databases are using same technique and in general case the user must not consider that DB is using MVCC, actually most of the developers do not know what MVCC is. Answer is very simple, databases are central, they execute most of the operations in memory, they can use very low level process locking and synchronization and most of the operations take milliseconds. HF is decentralized with many participants that can be offline, or to have huge network lag. There is no technical way to synchronize process in machines that are located in opposite side of the world. Also some of the machine can run Linux, other can be on Windows, BSD…

For that reason architecture, structures and flow must be adapted in such a way that no conflicting transactions happens, or at least are very few. Here are some of the common ones that can be adapted for your particular case.

No duplicated keys

Because collision happens only if we try to update same key at same time, if we never use same key, collision will never occurs. This is very simple to implement in chaincode, but will cause a lot of other issues later. Instead having one key for the account you will have hundreds or thousands keys, and to know the actual amount in this account chaincode must take all this keys, iterate over them and define what is the current value. You must understand that in this case you “bypass” the MVCC and you are open for double spending problem, because the objective value that MVCC is checking is not one key, it is a result of another process — iterating all the keys and do some math over values. This approach is useful when you want to store huge amount of fast data stored immutably in blockchain, but this data will not be part of the business process. For example high speed sensor.

Pros:

  • Extremely easy for implementation
  • No key conflicts
  • Maximum throughput is achieved

Cons:

  • Bypass MCVV, open for double spending problem
  • Complex key management if you have more than one sensor.
  • Cannot create business logic in chaincode using this values, because you never know which one is the latest value

Put requests in queue

This approach require synchronisation to be done on application level, not on the level of the chaincode, and is very simple. You put request in queue and send new request only when previous one finishes (block is committed). In this naive approach you are limited for around one transaction per second, depending on HF configuration. Better way of doing it is when application layer know which keys this transaction will update. If this is possible then you create different queue for every key and send transaction to SDK only when there is no process that currently is updating this key. This can be very challenging, and definitely is not good architecture, because application layer must know what chaincode is doing, but is easy for implementation and in some scenarios may be efficient enough.

Pros:

  • Simple implementation
  • No key collisions
  • Deterministic throughput of the system
  • Ledger always have latest objective value, business logic can be implemented on top of it.

Cons:

  • Application layer must know what chaincode is doing
  • Very low throughput of the system, not scalable
  • If chaincode have a lot of keys, queues management can become tricky.

Running total

This approach is similar to first one, store values (or there deltas) in different keys, but have a different process that is constantly running and updating some other key, that will hold the result of this process. Or to say it in other words, running total process is constantly checking all the keys and is calculating the latest objective value, and is putting this value to some other key, that is always the same. The advantage of this approach is that there is some value that is the actual (or very close to the actual value), and this value can be used for business logic. So if your process can handle some delay in the actual value (like some IoT sensor that must trigger some process) this approach is very effective. But if you are transferring money, or some assets where overspending must not happen, this approach is not good, unless you have clear procedure what to do when overspending happens. In this situation when actual value become negative the running total process can trigger “overspend” chaincode (or process) to fix the situation. I’m not recommending this approach for financial services, but in some specific cases can be very effective.

Pros:

  • Very easy for implementation
  • Have (almost) objective real value that can be used for business logick inside chaincode
  • Application layer may not know what chaincode is doing
  • Scalable
  • No key collison

Cons:

  • Not suitable for processes where overspend cannot be handled
  • Overspend process can become very complicated
  • Objective value can have delay

Batching

This technique relay on the fact that chaincode can update same key multiple times in same execution. So application level collect some transactions and send them as one (array or list) to the peer/s. Chaincode must know how to handle list of transactions and execute all of them at the same time. Result of the simulation will be the latest value from all transactions, and this value will be objective. When block is committed, application will take new batch from the queue and will send it to the peers. So while block is commited application just collect new transactions and putting them in the queue. By fine tuning the size of the batch a very high throughput can be achieved. In my test performance degradation is around 5%. This approach is applicable for processes with high amount of transactions, else queue will not have enough transactions to fill the batch.

Pros:

  • No keys collisions
  • Very high throughput
  • Application layer may not know what chaincode is doing
  • Very easy for implementation
  • Ledger always have latest objective value, business logic can be implemented on top of it.

Cons:

  • Chaincode must be written in such a way that is able to handle multiple transactions in one execution. Most importantly to track intermediate values of the keys while is executing transactions.
  • If the amount of requests is not in narrow range an algorithm for automatic choosing of proper batch size must be implemented. Else when request count is low system will be slow waiting for queue to fill. If request count is much higher queue will be filled to fast, and it will continue to grow.
  • More complex event messaging must implemented so application to know which transaction from the queue was successful.

This is a short list of different techniques, there are more advanced and specialized, but I hope that I was able to explain you why this is very important topic when you are doing HF architecture. I strongly recommend to read the official documentation of HF, especially the parts where transaction flow is explained. When you know the tools that you are using, you can be as effective as possible with them.