Django and Isolation (Part 1): A look at performance improvement strategies leveraging MVCC

While MVCC ( multi-version concurrency control ) has been around for a while and is one of the most popular techniques adopted by SQL based RDBMS’ to offer transaction isolation guarantees, it is often not being leveraged by code written at the app-server layer ( Django, Flask etc., )

Aditya N

Published in

The Startup

7 min readMay 4, 2020

An ancient traditional game played in Southern India. Image courtesy: Sirisha

Background

So about a couple of months ago, I was being interviewed for the position of a backend engineer at a product based startup in Bengaluru and was asked this question:

What is MVCC and what are the different isolation levels offered by PostgreSQL?

I had an idea about how isolation in general worked in databases but I had no clue about what MVCC was and that various RDBMS’ offer different isolation levels that conform to this protocol which could be used for different use-cases, to optimise performance.

This was the primary motivation for me to learn and experiment more on these concepts, all of which culminated into writing this blog.

What’s all the fuss about MVCC ?

Before trying to understand MVCC, let’s take a look at the two broad categories of concurrency control techniques that have been adopted by RDBMS’ over time.

PCC (Pessimistic Concurrency Control): This approach makes use of locks to prevent conflicting operations from executing. It’s termed pessimistic because when one transaction obtains a lock, all other concurrent transactions trying to operate on the same row/table should wait until the lock is released.
OCC (Optimistic Concurrency Control): This approach generally tries to minimise usage of locks and attempts to execute transactions concurrently, checking at the commit of each transaction, whether there has been a conflict with another transaction due to its operations. If there was a conflict, it is handled according to the isolation strategy in place.

MVCC ( multi-version concurrency control ) is an implementation of OCC wherein the underlying concept is to maintain multiple versions of each row / object in the database to handle concurrency issues.

When a transaction in an MVCC database tries to update a record, a snapshot of the record is taken at the instant this transaction begins. Now the update operation is performed within this snapshot. When the transaction tries to commit, it checks whether a newer version of the record has been committed by a concurrent transaction. If yes, then depending on the isolation level set for the transaction, it could either perform the update on the latest version of the record (Read Committed) or could get aborted due to a serialisation failure in which case it will have to be retried (Repeatable Read / Serialisation). If no, then it proceeds to commit normally.

To achieve the above behaviour consistently, MVCC uses timestamps and incrementing transaction IDs.

Let’s get real

For the following examples, code snippets will be in Python/Django, ( the web server ) with PostgreSQL ( implements MVCC to guarantee isolation ) as the database.

But the concepts from these examples can be extended to any web server framework and any MVCC compliant database.

All of the below scenarios have been tested with the following spec.

4-core CPU ( 2 physical, 2 virtual ), 8 GB of RAM.
Gunicorn to serve the Django app with 2 workers and 2 threads (gthread) per worker, to get maximum parallelism.

Scenario 1: writes, writes everywhere

Assume that we’re building a real-time banking application in which an API should be exposed for performing a bank transaction i.e debit from one account and credit to another.

Let’s say the table initially looks like this:

Account table:id  | name  | balance 
----+-------+---------
  1 | Bob   | 1000   |    
  2 | Alice | 2000   |

The code snippet for the API could be as follows:

# For simplicity, we're assuming that there are only two records in
# the accounts table. One is the debit acc and the other is the
# credit accdef put(self, request):
    with transaction.atomic():
        qs = Account.objects.filter(id__in = [1, 2]).order_by('id')

        acc_to_debit = qs[0]
        acc_to_credit = qs[1]

        amount_to_debit = request.data["amount_to_debit"]

        acc_to_debit.balance -= amount_to_debit
        acc_to_credit.balance += amount_to_debit
        
        acc_to_debit.save()
        acc_to_credit.save()

    return Response(data = {})----------------------------------------------
Test 1:No of parallel requests: 1
amount_to_debit: 100Result:id  | name  | balance 
----+-------+---------
  1 | Bob   | 900    |    
  2 | Alice | 2100   |----------------------------------------------
Test 2:No of concurrent requests: 4
amount_to_debit: 100Result:id  | name  | balance 
----+-------+---------
  1 | Bob   | 800   |    
  2 | Alice | 2200  |

On observing the result from Test 2, we find that though there were 4 requests made, it has effected only one debit/credit operation in the database.

Reason for anomaly:
Since the server was capable of handling 4 requests at a time, 4 concurrent transactions were opened at nearly the same time and the values that each of them read from the database, for the credit and debit accounts, were 900, 2100 respectively. And so, for all 4 transactions, at the point of commit, the computed values for the debit/credit accounts were 800, 2200 respectively.

There are two ways of solving the above concurrency problem:

Using locks
Leveraging the isolation guarantee provided by MVCC

Solution 1

# Using locks# from the above code snippet, the lines where respective account
# records are retrieved from the database have to be modified as 
# follows...
qs = Account.objects.select_for_update().filter(id__in = [1,2])
...Test 3:No of concurrent requests: 4
amount_to_debit: 100Result: ( Assuming DB state continues from test 2 )id  | name  | balance 
----+-------+---------
  1 | Bob   | 400   |    
  2 | Alice | 2600  |

Reason for correctness

The Django ORM translates select_for_update() into the
SELECT...FOR UPDATE statement in PostgreSQL which obtains a
lock (exclusive) on the corresponding result set. Only one transaction, at any given time, can hold the lock on this result set.
So when the above code runs, one among the four transactions’ acquires the lock, performs its operations and releases it subsequently when this transaction either commits or rolls-back. Therefore, when one among the three pending transactions’ acquires the lock, changes made by the previous committed transaction are taken into account. This process repeats until all transactions’ either commit or roll-back.

Note: The lock used by SELECT...FOR UPDATE does not block reads because the underlying principle of MVCC insists that writers should not block readers and vice-versa.

Solution 2

Before moving to the solution, note that there are three distinct isolation levels offered by PostgreSQL that implement the MVCC protocol.

Read Committed ( default in PostgreSQL )
Repeatable Read
Serialisable

Let’s first try and understand the behaviour of the Read Committed (default) isolation level.

When a new transaction begins, a point-in-time snapshot of the database is taken and all operations that follow are scoped to this snapshot.
Any reads within this transaction will have the following semantics:
- All reads are served from the snapshot.
- If there are writes (inserts/updates/deletes) in the transaction, they’re scoped to this snapshot. So a subsequent read will include those changes, though the transaction hasn’t been committed.
- Also any updates made by concurrent committed transactions are visible to this transaction’s snapshot and so a subsequent read will include those changes as well.
Any writes within this transaction will have the following semantics:
- All writes happen within the scope of the snapshot.
- Subsequent writes will include changes made by previous writes within this snapshot.
- If there’s a write(update/delete) operation, and a concurrent transaction is already trying to write(update/delete) to the same record(s), the former waits while the latter either commits or rolls-back. If it commits, then the former takes into account the changes made and performs its operation on top of that. If it rolls-back, any changes made are negated and the former continues to proceed with its operation.

Consider the following modifications to the original API

# Using isolation guarantee provided by DB.def put(self, request):
    with transaction.atomic():        amount_to_debit = request.data["amount_to_debit"]        Account.objects.filter(id = 1).update(
            balance = F('balance') - amount_to_debit
        )        Account.objects.filter(id = 2).update(
            balance = F('balance') + amount_to_debit
        )    return Response(data = {})----------------------------------------------
Test 4:No of parallel requests: 4
amount_to_debit: 100Result: ( Assuming DB state continues from test 3 )id  | name  | balance 
----+-------+---------
  1 | Bob   | 0     |     
  2 | Alice | 3000  |

To understand why the above code worked, let’s breakdown its execution.

As opposed to the previous select_for_update() approach, the use of .update() on a QuerySet in Django does not pull the data into memory, rather it translates to a SQL-level update query and F() expressions allow accessing record level attribute values in that query.
When one among the 4 transactions execute the .update() statement, the other concurrent transactions which are trying to update the same row, wait for this transaction to either commit or roll back.
Once the transaction commits, one among the pending transactions receives control, fetches the latest version of the row (as a consequence of the isolation level) and performs an update on top of this and commits. This step is repeated until all pending transactions either commit or roll-back.

Performance comparison

Since the above two solutions yield the same results, let’s check if there are any differences in their performance.

Solution 1 (Locks):
- No of concurrent requests: 5000
- No of iterations: 3
- Average time taken: 37.59 secsSolution 2 (Using isolation guarantee):
- No of concurrent requests: 5000
- No of iterations: 3
- Average time taken: 33.46 secs

Though the working principle of both the above solutions are very similar, let’s see why using the isolation guarantee has an edge over using locks.

In Solution 1, the records to be updated had to be queried and pulled into memory, resulting in one extra query to the database.
The time at which locks are acquired and released for each concurrent transaction is handled completely by the DB in Solution 2.

Also, we see that using Solution 2, would make a difference only when the application’s workload is write-heavy.

Wrapping up

Hope you guys have gotten a good idea about MVCC and the way code at the app server layer could be restructured to take advantage of the isolation guarantee offered by the database.

Though the default isolation level seems perfect for many use-cases, there are a few scenarios where it can cause anomalies. I’ll be explaining how these scenarios could arise when building real-life applications and ways of solving them using a different isolation level, in a follow up blog.

Also, please comment for questions, suggestions and feedback.