Pluggable Transaction Authorizations

Published in

Brex Tech Blog

10 min readJul 18, 2022

Introduction

Brex is building its financial operating system from the ground up. Doing so gives us complete control over how we process transactions and allows us to build whatever features we can dream up to wow our customers. In Brex’s early days, a single service implemented all features related to approving or declining transactions. Now, as Brex adds more teams and builds more features, we need a scalable way for teams with different requirements to implement logic that influences whether we authorize a charge on a Brex card. Our solution to that problem is pluggable authorizations.

From Microservice to Monolith

Brex uses a microservice architecture. An earlier article, “A Transaction’s Journey through Brex”, discusses the role of the Transactions processor service, which stores records of credit card transactions and decides whether to approve or decline them. Brex decides whether to approve or decline them based on rules such as credit limits, per-user spending limits, and fraud controls. In the beginning, it made sense for Transactions processor to enforce all of these rules, but that changed as the company grew. There used to be one team that owned these rules. Now we have multiple teams, including:

A team that owns credit limits.
A team that owns budgets.
A team that offers alternative ways for customers to pay for transactions.
A team that owns fraud controls.

Had we continued to place all of this logic in Transactions processor, that service would have become a nightmare of a monolith. One particular attribute of monoliths that would have been a problem is the way that broken code can break the entire monolith. For example, a resource-hogging bug in the per-card limit code could have brought down our entire transaction-processing infrastructure. But instead of letting Transactions processor become a monolith, we built pluggable authorizations.

The Solution

When a microservice starts to look like a monolith, the standard solution is to break its components into their own microservices — so that’s what we did. But moving components into their own services came with its own problems. In the monolithic world, the following steps took place inside an atomic database transaction:

Calculating how much money had been spent in previous transactions.
Deciding, with step number one in mind, whether we would approve or decline the current transaction.
Recording whether we would approve or decline the current transaction.

With different portions of the authorization logic in their own services with their own databases, atomicity was no longer an option. That change presented challenges in making the correct authorization decisions and in keeping consistent records of those decisions. Pluggable authorizations are centered around solving these challenges while maintaining the advantages of microservices.

Pluggable authorizations have two components: The pluggable authorizations framework and the actual plugins.

The framework

The framework connects Transactions processor, which is the service primarily responsible for handling operations on credit cards, to other services known as plugins. It allows Transactions processor to ask plugins whether transactions should be authorized and to update plugins when changes to transactions take place. A single team owns Transactions processor and the framework.

The framework’s Interface

Credit card processing takes place using domain-specific protocols that product code shouldn’t need to implement. The pluggable authorizations framework defines an interface that abstracts over those protocols to make credit card transactions easy to work with. The most important parts of that interface are the AuthorizeTransactionRequest and the CardTransaction. AuthorizeTransactionRequest is the message that Transactions processor sends synchronously to plugins to ask them whether it should approve a transaction. CardTransaction is the message that Transactions processor publishes via a Kafka pipeline to asynchronously inform plugins of the latest state of a transaction. Here are abridged versions of their protobuf declarations:

message AuthorizeTransactionRequest {
  // The ID within the Transactions processor service of
  // the transaction that this charge belongs to.
  string transaction_id = 1;
  // The card’s ID.
  string card_id = 2;
  // The charge’s parameters, including its amount and metadata
  // about the merchant and point of sale.
  CardTransactionOperation authorization = 3;  // Additional fields have been omitted for brevity.
}message CardTransaction {
  // The transaction’s ID.
  string transaction_id = 1;
  // The card’s ID.
  string card_id = 2;
  // A history of the transaction’s state.
  repeated CardTransactionState transaction_states = 3;  // Additional fields have been omitted for brevity.
}

The framework’s authorization flow

When Transactions processor receives a transaction authorization request from a card network (Mastercard or Visa), it retrieves information about the card that the transaction is on, such as the card’s ID and the user ID of the cardholder. With that information and the information contained in the card network’s request, Transactions processor builds an AuthorizeTransactionRequest to send to the plugins.

Transactions processor sends the AuthorizeTransactionRequest to each of the plugins’ services in parallel and with a short timeout on the RPC. The short timeout is necessary because, if Transactions processor waits too long for a reply from a plugin, the card network will assume that Brex’s authorization endpoint is down and will make its own decision about the authorization. The decision is made as follows:

When a plugin times out or returns an error, Transactions processor falls back to a default decision for each plugin. For some plugins, it acts as though that plugin approved the transaction. For others, it acts as though that plugin declined the transaction. Doing so keeps a single plugin’s failure from causing Transactions processor and the authorization process to fail.
If any plugin declines the transaction, then Transactions processor declines the transaction.
Otherwise, Transactions processor approves the transaction.

After Transactions processor makes a decision and responds to the card network, it publishes a CardTransaction event to inform the plugins of whether it approved or declined the transaction.

This is how pluggable authorizations work on the Transactions processor side. Note that there are several places where failures may occur — failures that would result in inconsistencies between the services involved. At a high level, we solve that problem and achieve eventual consistency by doing the following:

Using a transactional outbox inside Transactions processor to guarantee that a CardTransaction event gets published if Transactions processor creates a record of a transaction.
Making plugins delete records of transactions that never receive a CardTransaction event.

We’ll talk about #2 a bit more as we discuss how we implement plugins.

Plugins

The plugins implement product-specific logic to help decide whether we authorize transactions. At a minimum, each plugin needs to implement an RPC endpoint which accepts an AuthorizeTransactionRequest and responds with either an approval or a decline. Depending on the authorization rules that the plugin is responsible for enforcing, this can vary a lot in complexity. We currently have four plugins owned by four different teams, and we will continue to add more plugins as Brex grows. This section will focus on the Expenses plugin, whose design is representative of most plugins.

The Expenses plugin

One of the rules that the Expenses plugin enforces is card limits, which allow customers to limit how much a card can spend in a particular amount of time (e.g., $100 per month). To enforce this rule, the service must store the state representing each card’s user-configured limit if it has one, as well as how much the card has spent so far in the specified amount of time. During each AuthorizeTransactionRequest it compares the current spend on the card to its limit and determines whether to approve or decline the request.

One of the challenges we faced with developing plugins was how to address concurrent authorization requests. In the old model in which all rules lived in a single service, we could solve this problem the following way:

Begin a database transaction
Acquire a row-level lock on the card’s customer
Execute all rules
Persist the result in Transactions processor
Commit the transaction and release the lock

This way, transactions for a particular customer were serialized, solving the concurrent authorization requests problem. If a card had a $100 limit and we received two $100 authorization requests at the same time, we processed them serially, approving the first request but declining the second request due to the first request’s impact on the card’s balance.

With pluggable authorizations, this becomes more complicated because multiple services are involved. One option would be to simply keep the same approach, but replace step 3 with calling all the plugins. The problem with this is that we want to avoid performing RPCs from within a database transaction — something that we generally avoid doing. In this case it is undesirable because it would require Transactions processor to acquire a database lock around an entire Brex account and to hold that lock until all the plugins returned. Performance degradation in a plugin would likely result in lock contention and performance degradation in Transactions processor, which would reduce the benefits of pluggable authorizations.

Instead, we use locks inside the plugin to serialize, within the plugin, requests tied to a given card. These locks are far more fine-grained and less likely to cause contention than the account-wide locks that Transactions processor would have required. If the Expenses plugin receives two $100 requests for the same card at the same time, it will handle them serially, declining the second one. This is imperfect though, because even if the Expenses plugin approves the first request, which uses up the card’s whole limit, another plugin may have declined it. The Expenses plugin would only find out about this once it receives a CardTransaction event, which could be after it’s already declined the second request. In this case, it was technically incorrect to decline the second request, because the first request, which was declined by the other plugin, in fact shouldn’t have used up the card’s limit.

While imperfect, this approach is more conservative in preventing customers from spending above their configured limits. This decision is also driven by the fact that most transactions on Brex cards are ultimately approved, so the chance of being incorrect by assuming that all other plugins will approve is lower than the chance of being incorrect by assuming that one of them will decline.

One of the consequences of this approach is that the plugin needs to distinguish between local and external decisions that have been made about an authorization request. That is, it may locally approve a request, but then later receive a CardTransaction event suggesting that some other plugin actually declined, and adjust its record of the card’s spend accordingly.

This sequence diagram illustrates the flow described above, including how Expenses can reconcile a local decision to approve with another plugin’s decision to decline.

Furthermore, we want to ensure eventual consistency and protect against cases in which some upstream failure prevents the CardTransaction event from being published. For this reason, the Expenses plugin also has a background process which routinely deletes stale local decisions which have not been replaced by an external decision after some time has passed. As mentioned earlier, failing to publish the CardTransaction event happens only if Transactions processor does not save a record of the transaction, which means that deleting the local record makes the plugin consistent with Transactions processor.

What about two-phase commit?

The interaction between Transactions processor and Expenses bears resemblance to two-phase commit (2PC), but there are some crucial differences, and we knew early on that 2PC was not the best solution to this problem. At a high level, 2PC provides strong consistency guarantees at the cost of not having termination guarantees. Pluggable auth weakens the consistency requirement in order to gain termination guarantees.

In the first phase of 2PC, the participants prepare to commit an action, but they don’t commit it. The results of the action aren’t visible outside of the 2PC transaction. On the other hand, the Expenses plugin does commit, before returning its response to Transactions processor, its local record of an authorization and whether that authorization was approved or declined. That local decision may be overridden by a CardTransaction event later, but, until then, it is committed and is visible outside of the in-progress transaction. Furthermore, when one plugin approves the transaction and another declines it, an external observer can see that inconsistency. 2PC would not allow that inconsistency.

In the second phase of 2PC, the participants wait for the coordinator to send them a decision, which can result in the transaction’s hanging permanently if a failure happens in the wrong place. In pluggable auth, the plugins go about their business after responding to the AuthorizeTransactionRequest, and there’s no risk of hanging in case of failures.

We chose not to use 2PC because it is difficult to handle 2PC’s non-termination problem and because we did not need strong consistency. It is also worth mentioning that the way pluggable auth achieves eventual consistency, by publishing events that override any local records, mirrors the way that card networks achieve eventual consistency with card issuers.

Conclusion

We shipped pluggable authorizations in late 2020 with the goal of making Brex’s transaction processing infrastructure scale better to a growing engineering team. So far the system has been running smoothly, with 3 plugins in production and 1 more in development.

Future work includes:

Migrating all transaction authorization rules out of Transactions processor and into plugins in order to maximize separation of concerns and maximize resiliency.
Streamlining the plugin development process by allowing plugins to register dynamically as opposed to having Transactions processor hard-coded to call a set of known plugins.
Streamlining the plugin development process by writing libraries that generalize common work performed by plugins.

Interested in solving challenging microservice architecture problems to build a financial platform? Come join us at Brex!