Lords of ACID

Couchbase
Couchbase
Published in
5 min readNov 19, 2019

by Andrew C. Oliver

The first NoSQL databases dropped ACID and transactions (mostly). The vendors even told you that you didn’t need transactions. Maybe you believed them.

The rationale goes like this, at least for document databases. Document databases are the most common NoSQL database in use today, and there is a good reason for that. Documents map closely to an “entity” as modeled in modern applications. Unlike an RDBMS, you don’t need multiple Tables (or documents in this case) to fully represent an entity. So in “Person” you can have multiple “phone numbers” without having a different thing in the database. Updates inside of one Document are atomic. So you don’t need transactions…as badly.

However, then you wrote a real application. The world is more complicated than being able to stick everything in one big entity.

Here are some reasons you can’t achieve everything without transactions:

  • You even succeeded at one big entity in release one. In release four or five, you ended up with a use case that would have required to change your whole model and combine documents that used to be separate.
  • Bad things tend to happen when your document gets too unwieldy.
  • You had to make compromises in your model design for memory or cache management reasons.

As it turns out, you need to make changes to more than one document and have those changes take effect at the same time or not at all. Guess what my friends, you need…an ACID transaction that spans more than one entity/document/row/whatever.

So let’s review what ACID is and ask the burning question that developers and everyone in the 90s club scene used to ask, “when should I use ACID?”

ACID vs Transaction:

A transaction is a unit of work. A demarcation of actions your application has taken: generally speaking state changes. It isn’t — by itself — a guarantee. Your database will provide different levels of isolation. The transaction is the demarcation of what should be ACIfD.

ACID is:

  • Atomic — all or nothing.
  • Consistent — it is the same each “time” within a span of “time.”
  • Isolated — “The Others” can’t see what I’m doing until I’m done.
  • Durable — Once it has been committed, it stays even if you unplug the thing. Which it turns out can even happen to the cloud.

ACID is not:

  • A complete guarantee in all cases — usually.

Why?

There are different levels of isolation. There are also different means of providing isolation. The classic way is via a lock, meaning everyone has to wait in line. Back in the day those often were on “pages” which you had no idea which data was on a “page” providing a kind of “spice of life.”

Some databases do this lock by “row.” Others do something called “Snapshot isolation” where you work on a copy until it is committed. What isn’t stated by ACID is the behavior when one transaction reads, another reads and one commits first. Unless you know what you are doing, Oracle by default commits both in sequence. Meaning the last transaction to commit won sometimes resulting in data loss.

Some databases also use client-side transactions where your writes are all kept on the client until it is time to read. This doesn’t provide full isolation, meaning if I set state=’GA’ on a document and then run a select “where state=’GA’” it won’t be returned. However, the writes in the transaction happen all at once and are isolated from the view of others.

ACID also doesn’t typically mean your row counts are the same on a table, collection or bucket. On some databases it can, but generally that is done via the equivalent of a table lock. That means ONLY one client at a time in that entire table. That is pretty slow. There are compromises (range locks), but let’s just say this isn’t going to scale to the modern internet and is kind of an edge case you hit only a few times in a career.

You don’t need ACID for:

Just writing to an entity or even two different entities. You only need it where the sum is more than the parts. Assume you have Orders and Line Items and for some reason, those can’t be stored in one entity (read document, table, column-row). If you write the line items before the Order, it hardly matters what happens (you might need to purge orphaned line items). Moreover, a datestamp field takes care of most of the edge cases. If you added a line item after the order shipped, the system can tell. There are some “yeah buts” on this but it is usually “good enough” and the risk of a stray or inconsistent read or write are not high.

In the real world you probably don’t just create an order with some line items, you probably are going to change inventory and create a shipment. You don’t necessarily want some of these things to happen if other things fail.

So you want to try ACID?

Couchbase is one of the document databases that offers transaction support. If you want to try it, we have a tutorial using the Java transaction API. There is also a video tutorial of how to use transactions.

Takeaways:

  • You don’t ALWAYS need transactions
  • If you are writing to two entities (Documents) and want to ensure both changes happen or neither does, then you need an ACID transaction.
  • Different levels of isolation affects what the transaction actually means.

Andrew C. Oliver learned to code when he was 8. He founded the Apache POI project and served on the board of the Open Source Initiative. He writes a column for Infoworld and is the Director of Product Marketing and Evangelism for Couchbase.

--

--

Couchbase
Couchbase

The world’s best open source database for building scalable, high performance web, mobile & IoT applications. www.couchbase.com