Gusto Engineering
Published in

Gusto Engineering

How to prevent non-atomic actions in Rails transactions

In Rails applications, we typically wrap database changes that are required to succeed as a single atomic action in an ActiveRecord::Transaction. However, transactions can sometimes include actions that are not database calls. When these actions are network calls, notifications, or async processes, they can yield results that are difficult to rollback or worse introduce bugs that are tricky to identify. In this post, we’ll look at what problems these non-atomic actions pose and walk through a few examples of how to fix them. We’ll also cover isolator, a gem that helps us detect these non-atomic violations.

A tale of a transaction

Let’s take a look at the classic example of Bob sending money to Alice.

This method creates a Transfer record and updates the sender’s and receiver’s accounts. It makes sense to wrap these operations in a transaction. If we fail to deposit money into Alice’s account, then we should not withdraw money from Bob’s or create an unprocessed Transfer record. But let’s dive into what’s going on in the withdraw and deposit methods.

Oh no, these methods don’t simply update ActiveRecords! They also queue up background jobs and send emails. If we fail to update Alice’s balance, then all db changes in the transaction are rolled back as expected. However, our BankWithdrawJob is already queued up and the email to Bob cannot be unsent. In this example, the job will fail because it won’t be able to find the transfer, which is a better outcome than actually withdrawing money from Bob’s account. It’s still not a pleasant experience for Bob who received a notification that didn’t match his account records.

Moreover, these non-atomic actions can also interfere with the db calls and increase the chances of failures. If an operation, such as a network call or the generation of a huge file, takes a really long time to complete then it’ll keep your db transaction open for that duration. If an error is raised when executing non-atomic actions, then the error will trigger the transaction to rollback perfectly valid db changes.

But the troubles don’t end there. We observed that background jobs queued inside transactions can sometimes be run before the transaction completes. This can either lead to job failures or produce undesirable results because the records aren’t in the state we expect them to be. If you find instances of background jobs in your system that sometimes fail and subsequently succeed on retry without other transient behaviors at play, the culprit may be this race condition introduced by jobs enqueued before the transaction committed.

Fixing non-atomic violations

The most obvious approach is moving them out of the transaction:

However, sometimes it may not be desirable to structure your code in such a way. Additionally, your method may be nested in another parent transaction. Suppose the transfer is just a step in an order fulfillment transaction as follows:

It could be transactions all the way up! Our problem would be solved if we could leave the non-atomic code inside the transaction but execute them after the transaction. We know that ActiveRecord models have after_commit hooks that are executed after the transaction completes. Wouldn’t it be nice if the same behavior exists for explicit transactions? That’s exactly the solution provided by the gem after_commit_everywhere, which allows us to safely co-locate non-atomic code inside transactions by wrapping them in after_commit blocks as follows:

So far, we’ve been dealing only with explicit transactions. What about implicit transactions, those that are parts of Transfer.create! and account.update! calls? Suppose we have these callbacks in the Transfer model and they shouldn’t be executed as part of the transaction:

We know that after_create and after_save are executed before the transaction committed, so simply move them to after_commit hooks instead:

As we’ve seen in this section, the fixes aren’t complicated once we identify the problems. But how do we reliably identify the violations in the first place? Surely, we can’t rely on developers to consistently spot them during development or worse as bugs in production.

Isolator

We set out to search for a solution to automate the task of detecting these non-atomic actions and quickly found the isolator gem. Isolator works by tracking whether we’re in a transaction and raises an error when a non-atomic action is invoked. The gem comes with several default adapters that can detect http calls, mailers, and background jobs (sidekiq and resque are also supported). We can also add custom adapters to support other custom actions.

Internally, isolator keeps a transaction count for each database connection. It monitors every SQL statement to detect a beginning or end of a transaction and increments/decrements the count accordingly. Because of this overhead, it’s not recommended to use isolator in production. At Gusto, we enable isolator in the test environment and rely on our high test coverage to surface non-atomic violations in CI.

Once installed, isolator will surface non-atomic violations if a test invokes the code path that contains the problem. Here’s a sample output of a violation that occurs when a background job is enqueued inside a transaction:

Isolator is pretty smart about detecting dependencies and automatically configures the relevant adapters when it initializes. For this reason, we install it with the require: false option and load it as part of spec support when all core dependencies are already loaded.

In a large code base, you may find it overwhelming to fix all violations at once. A strategy that works really well for us is to deal with violations one category at a time. By disabling all adapters and re-enabling one by one we get to control when to apply the fixes. For instance, here’s how to disable the http adapter for the entire test suite.

Summary

In this post, we learned about the problems non-atomic actions can cause to your application. We also covered tools such as isolator and after_commit_everywhere that help us detect and fix the problems. These tools have helped us remove numerous transactional violations and continue to safeguard our app against non-atomic operations. We hope the information in this post will also help you keep your app transactionally safe.

Originally published at https://engineering.gusto.com on November 2, 2021.

--

--

Reengineering Payroll, Benefits, and HR for modern business. Hiring empathetic engineers in San Francisco, Denver and NYC! https://gusto.com/about/careers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store