Developing a new stock service using Erlang

Boozt Tech
Boozt Tech
Published in
8 min readDec 20, 2019

by Peter Lind

Sometime early 2019 we were working on a stock issue for our Booztlet.com platform. Somehow, stock levels had come out of sync between our two websites; Booztlet.com and Boozt.com, and our proprietary WMS, Fastlane. Even worse, it turned out items on hold had come completely unhinged for Booztlet. The result was a very large amount of items not available to customers. Fixing the issue meant diving into how our systems handle stock and reviewing how knowledge of what can be sold or not sold is propagated through the systems.

We started looking into how we could fix several issues related to stock; how to keep stock synced between systems, and how to avoid “overselling” — which is basically selling the same item to different customers when high activity cross-trading between Boozt.com and Booztlet.com happens. A number of solutions were discussed, but we eventually decided on one that fits — a stock micro-service, acting as a single source of truth for most systems that need stock info.

The problem

Our previous setup revolved around passing messages through the message queuing system, RabbitMQ. While this works well in many ways, it also creates some issues, especially for a high throughput system. First of all, there is a gap between orders being made and stock levels propagating. It also means there are more moving parts in the equation, as multiple systems need to handle stock information, in much the same way. And it means that systems need to handle both their own stock changes for orders and listen for those of other systems.

Previous workflow in propagating stock

Making a stock service API seemed the most sensible solution — we needed a single source of truth for stock levels and stock reservations, for all the systems that need it, and we needed a simple signature that systems can integrate with. However, there were some requirements for a service like this:

  • It had to be reliable and available — If we create a single source of truth for stock levels, we also create a single point of failure. Because this is directly tied to checkout, that’s a big risk.
  • It had to be performant — It has to be able to handle Black Friday, either by performing very well or by scaling very well.
  • It had to be consistent — We trust it to be a single source of truth, so the information it provides must be correct.

Considerations

PHP is a great language that we use it widely across out tech platform, it does so many things relatively easy. It is, however, not the most performant, nor the most stable. While PHP, in general, is not the bottleneck of apps, the combination of a webserver with no caching, and PHP, would likely be an issue. Plus, to add caching for reads would require an external service.

So we had to look elsewhere for a solution, focusing on the read aspect of the service, namely getting accurate stock levels. The presumption was that reads would occur a lot more often than writes (i.e. reserving items for an order and fulfilling them), which means caching stock levels between updates/writes. Doing this in the service itself rather than in an external caching service would remove network overhead and make cache handling easier. This solution presents another problem though — you need a very stable technology, as you store state in memory.

Solution

After a workshop with our tech architects during our yearly platform conference in May, we decided to experiment with a technology that is new to Boozt. The criteria was as follows; it shouldn’t crash, can store state in memory, and is fast and performant — preferably scalable as well.

Turns out there is a technology that’s focused on delivering these aspects, and it’s the language Erlang. It’s a language that Ericsson developed in 1986 (yeah, that long ago) to use for distributed systems in telephony hardware, with stability as one of the primary concerns. It’s proven itself to the degree that 90% of all internet traffic passes through one or more devices running Erlang. The key features of Erlang are:

  • a managed language — your code runs inside the Erlang VM
  • a functional language — no OOP
  • highly fault-tolerant — one part crashing does not bring other parts down, the crashed part just gets restarted
  • it was built with concurrency and distribution in mind

Reasoning

But why choose Erlang for the stock service, specifically? Focusing on the read aspect of the stock service, the thought was that the fastest, most performant solution would be to query a service that just checks its own memory and spits out a response. We considered other alternatives such as, C# or Java, but then you have to solve availability, concurrency and fault tolerance on your own. And with PHP, each request needs to load data anew or farm out the service to something external (redis, memcache, etc).

In Erlang, you can just do it — you start your service, load all stock levels into separate processes, and query processes for state when requests come in. Updating stock would be as simple as killing a stock level process and starting a new one with an updated state. But, the observant reader might ask: starting and stopping processes must come with overhead, right? Yes and no. Of course, there is overhead, but we’re talking processes running on the BEAM (the Erlang VM) not OS processes so the overhead is negligible.

Risks

Nothing comes without risks. Choosing Erlang for this project carries its own set of risks that we needed to mitigate. Risks such as:

  • Erlang is a language with a steep learning curve, very different from other languages we usually work with.
  • You can’t scale a solution that relies on a single process having state in memory.
  • Creating a micro-service to handle stock introduces a single point of failure. This is not Erlang specific, but we do need to handle that risk with Erlang

To mitigate these risks we did a few things. We centralized the experiment with a single advanced senior developer, started experimenting with Booztlet to minimize the commercial risk and we planned the new service in parallel with the existing service to enable full failover or roll back in case we would hit the wall. We also took advantage of the language called Elixir which compiles down to Erlang — and it is much, much easier to understand. You still need a functional mindset, but you avoid most of the quirks of Erlang, and you get some pretty sweet syntax helpings too.

Implementation

New Stock Service API flow

The actual implementation of the service started with a proof of concept. The best way to go about things is really to get your hands dirty. The sooner you start working with the technologies, the sooner you get the experiences necessary to shape a proper solution.

When you do something like this you want to be able to prove viability — and to do this you hack together a prototype. For the stock service, we needed to prove several things:

  • That the technology could work well
  • That the solution can handle reasonable traffic levels
  • That we can handle deployment and maintenance for the service

Technology

The Phoenix API turns out to be the go-to for Elixir, which is the new and modernised implementation of the Erlang langue principles. You have a router where you specify the request method and what method should handle that request type. You also have your middleware, in the form of Plugs. Due to the functional nature of the language, you’ll be interacting with things differently though. You will constantly be thinking about two things in Elixir: how to match specific patterns to handle different cases in your code, and how to transform things. You’ll be exercising filter-map-reduce a whole lot.

In terms of database, we’re using a double approach; with Mnesia (the built-in Erlang database) as a caching and concurrency providing layer, and Mysql as the actual data store. The reasoning behind this lies in a couple of problems that Mnesia introduces:

  • While Mnesia is a distributed database with transactions, it was created before the CAP theorem got a lot of attention — and as Erlang was built for use in hardware where netsplits were unlikely to happen (in phone switches or routers, for instance), this is not something the database handles on its own. So, if you have several nodes connected in network, all running a shared Mnesia database, and you happen to experience a netsplit, then you need to figure out on your own how to handle reconciliation afterward.
  • If you’re using Mnesia with persistence (keeping data copy on disk), then you need a master node creating the schema for the database. This can be any of the nodes, but it does give you issues with identifying the node if you need to bring it up after a crash for instance.

If you use Mnesia as a memory only state store and propagate changes in state to a more traditional database, then the second problem goes away, and your strategy for handling netsplits becomes falling back to the traditional database until you have re-established your network.

The second part of the proof of concept is infrastructure, in terms of deployment, and maintenance. Google has added guides on deploying Elixir apps using their Kubernetes service and building and deploying the service can happen automatically. Since we’ve moved a lot of our infrastructure into Google’s cloud, it means that we can easily deploy, scale and rollback the service as needed.

Relevant links:

Conclusion

Boozt is not a startup company, but in many ways, it still has a startup culture. Many things are possible here that wouldn’t be possible in other companies. As a developer with an idea you believe in, this is truly awesome — you get to make your case and have it judged on its merits, not on whether it makes it through three levels of meetings. Introducing Elixir for a core component and developing the service from proof of concept to production-ready service has been possible in a rather short time thanks to the culture.

This has been a great experience, and there’s been a lot to learn during the project. The fact that it’s been both possible and acceptable to develop a microservice in a completely new (to Boozt) technology, using new technologies for logistics, is great. Having the chance to implement what you think is the right solution to the problem you’re facing, instead of having a solution (or parts of it) dictated to you, is an extremely motivating and thrilling experience. And having the backing and cheering from key people in the organisation makes it even better.

If you enjoyed this article, and want to read more great stories from the Boozt platform team, be sure to subscribe to the Boozt Tech publication!

--

--