Your balance is $0.30000000004

Common problems found in money-handling software

Clément Salaün
Selency Tech & Product
5 min readFeb 17, 2020

--

If you write software that happens to manage money, I feel you. As an industry, we too often don’t hold ourselves to very high standards regarding reliability of the things we build. While an elevator is practically incapable of failing, things going wrong in software is business as usual. As a result, software that manages money if often built on flaky foundations.

In this article we attempt to build an inventory of frequent issues found in financial software, an inventory that you can use to check your current system against.

Sweetest spot on earth in the center — will rent

Arithmetical errors

Floating point

Let’s start with the obvious: the first problem that I perhaps encounter the most is the use of floating point numbers for the manipulation and storage of money amounts. Why is that a problem? Well, if you open a javascript console and execute the following:

"Your balance is $" + (0.1 + 0.2)

You’ll get the title of this article.

Because, by definition, we cannot have an exact representation of 0.1 or any negative power of ten in binary, we cannot use floats to accurately store and operate on money values without also having to use rounding.

The simplest way to completely avoid this issue is to use another data type than floats for representing monetary amounts. A frequent solution is to use integers, representing a money amount in its smallest unit (for example, 100 cents for USD instead of 1.00 USD). Decimal data types also work well, granted that your language supports them (java has BigDecimal, javascript doesn’t have any).

Allocation

The second most-encountered arithmetical problem probably is the use of division & rounding instead of allocation. Take this example: your ride-hailing app has a small reservation fee of $0.99 that you equally split between you and the driver. Should you use division and rounding,

= round(0.99 / 2) + round(0.99 /2)
= round(0.495) + round(0.495)
= 0.5 + 0.5
= 1

Assuming that your rounding function rounds to the nearest integer half up, a common default , an extra cent will be introduced to the system as $0.99 becomes $1. Rinse and repeat this operation on each ride and your system will quickly be completely off.

For this problem to be solved, rounding needs to be completely taken off the table and an actual decision needs to replace it: will the driver of the platform get the unsplittable cent?

👉 This solution is simply called allocation, and is often implemented by money manipulating libraries.

Concurrency issues

Young programmer discovering the power of race conditions

Back in 2015, a security researcher found a way to create unlimited money on Starbucks gift cards exploiting what is know as a race condition. Far from being a one-off, this kind of bug is a habitual offender of homebrew financial systems.

A race condition happen anytime code that wasn’t designed to run concurrently does so. The textbook example of this would be a wallet API that lets you draw more money from a wallet than available, on the condition that multiple withdrawal calls are executed at the same time.

To test your API against this issue, you need nothing more than curl with seq and xargs. Here’s an example:

#!/bin/bashseq 1 10 | \
xargs -n1 -P3 \
curl http://localhost/api/withdraw \
-X POST \
-H 'Content-Type: application/json' \
-d '{"from": "1234", "amount": 100}'

With this simple bash script, we’ll have 10 calls to our endpoint with a maximum of 3 jobs running in parallel. Is this test manages to go beyond what should be possible on your API, you have a race-condition.

The good news is — for most cases — this problem can be easily solved by locking. By using a mutex lock and serializing the calls processing, you make sure that your condition that checks wether the balance is sufficient or not will not be jeopardized by another thread. Unfortunately, there are cases where introducing a lock might not be that easy as their complexity will start to leak and tradeoffs will have to be made. These cases can include applications with high throughput or reduced access to network, all of which go beyond the scope of this article.

Unhandled remote failures

Null values

When a remote service fails to provide a value used further down in your system, you can either halt the system or use a previously obtained value if possible. What should not be done is to save a buggy value and continue operations as if nothing happened.

An example of this is currency conversion. Should a conversion rate fail to be fetched from a remote web-service, the last thing to do would be to save null and subsequently use this null rate to compute prices. If you were an airline company, flights tickets could be purchased for $0 thanks to this bug.

Partial processing

Another kind of unhandled remote failure emerges from partial processing. Say you initiate a transfer on your bank API, and the TCP connection dies in the middle of it. If you retry the call, you will risk executing the transfer twice. depending on wether the bank had received the request or not.

👉 One solution to this problem would be checking if the transfer happened before retrying it. While not universal, another simpler solution is the use of idempotency keys, granted the API you’re calling implements them.

Silent failures

Missed captures

Sometimes, the worst stories are not about how something terrible happened but rather how something critical dit not happen. In financial software, things that do not happen can sometimes result in pure money loss. An example of this is the authorize and capture workflow found on the API of most card payment service providers. When your code fails to capture an authorization and still proceeds with the sale, the money is lost forever.

👉 There is no off-the-shelf solution to this problem, as the solution largely depends on the specifics of your system. A decent approach we recommend sources its inspiration in contract programming and is to have a side system to continuously verify that what your main system did holds up to your expectations. We will soon publish an article on the Audit Layer we built at Selency, which implements this pattern and covers this case for us.

What now?

Building correct software is hard. Building correct financial software is critical. This inventory, this bestiary of gold-eating monsters of sorts, was built with the hope that it will help you check your current system and avoid preventable catastrophes.

All of these issues were either issues we experienced firsthand while building the Selency marketplace, or faced by members of our community : special thanks to AssoConnect, Yescapa and Meet in Class for sharing their valuable insights on the matter.

If you have stories yourself or think about other kind of failures, please share them in the comments — I’ll be thrilled to add new monsters to the bestiary.

--

--