Building a transportation cloud with microservices

How we created a highly-available distributed mobility optimization system in Scala, using Akka clusters, Kubernetes, and Kafka with a domain-driven approach, a functional programming mindset, and a culture of excellence

Published in

Bestmile

27 min readMar 1, 2021

When people ask me what we are doing here at Bestmile, I’m a bit embarrassed. The best way to describe it is no other than “we create a cloud platform to optimize transportation.” Every time I voice this out loud, it sounds pretty ambitious and even a bit boastful.

And yet, in practice, this is what we are attempting. We are modeling vehicles, bookings, plans, routes, availabilities, resources, etc., generically. We aim to support most surface transportation modes, from human-driven to autonomous, from schedule-based to on-demand, from handling passengers to goods, and all that mixed and matched. We use “Everest” as a codename for our platform. It’s the highest summit on this planet, and it’s tough to climb up there.

The system that we have built thus far is fascinating. It’s a thrill to observe vehicles moving with intelligent behavior on the map in real-time. We can’t wait to witness such technology becoming commonplace. And if our recent experiences are any indication, the public is very receptive. It’s hard to build an ambitious new product like this from scratch and even harder when succeeding means disrupting an old industry and creating a new market. No one knows when the transportation industry will finally make the necessary transition to a smarter, more responsible utilization of resources enabled by technology. But when this happens, we will be as ready as we can be.

I won’t be able to disclose the full details of our architecture here. But I wanted to share some of the guiding principles that have allowed us to go to production with a fleet orchestration cloud. What we have achieved is the result of intentional design. We build in increments, with an overarching vision guiding us. We follow a domain-driven approach, code in Scala for maximal expressivity and abstraction, and use cloud-native instruments such as actor clusters, Kubernetes, Kafka messaging, and infrastructure-as-code. Sharing a purpose has given rise to a culture of awareness and excellence.

📖 Table of contents
› Requirements
› Domain exploration
› Immutability
› Scale
› Culture
› A shared dream

Requirements

I’m stating above that we’re ambitious. But honestly, given our mission statement, it seems we don’t have a choice. When we start thinking about what we need for a smart transportation cloud, it’s exciting and scary at the same time.

Responsive, resilient, elastic, message-driven

You might have recognized the four tenants of the reactive manifesto. For us, this is not some nice-to-have philosophy but actual requirements. We can’t afford to have vehicles stranded in the middle of the street because we are rolling out some updates. When it’s rush hour, and everyone is moving, we need to scale up. And scale back down at night, when things quiet down. If there’s some failure somewhere in the system, passengers in an autonomous shuttle don’t care: they still want to get to their destination in time.

So for us, there isn’t any other option than to go full-on with high-availability. And that’s not all: multi-tenancy is also unavoidable. We have a customer base with varying needs, levels of involvement, and deployment maturity. It would be very cost-ineffective for us to host a different Bestmile cloud for each transportation operator. We save much by not having to run duplicated instances of our many services. By operating a single cloud, we focus our efforts on resilience and elasticity. This robustness also makes it possible to scale up and down dynamically and take advantage of cheap, short-lived instances. We spend more time focusing on application features than managing deployments (it is generally recognized that multi-tenancy is a defining feature of cloud computing).

All of that means that our platform must be up at all times, scalable in real-time, resilient, and support any number of customers.

Domain-driven

On the “business domain” side of things, it’s a lot of fun, too. We are finding ways to describe transportation at large using exact typing and terminology. We spend hours pondering about what a “vehicle,” a “booking,” or a “quote” truly is. We still discover new aspects to these simple concepts every day.

Our internal models and optimization engines are what’s most valuable to us. We have developed this by bringing people with various backgrounds to the table: domain experts, software engineers, and mathematicians, leading to a unique blend. To drive discussions, we use techniques such as event storming. The compact and expressive Scala syntax helps model data structures and define our domain-specific languages (DSLs) and even textual diagram parsers. We have mostly followed stateless approaches for our protocols and state handling challenges, using messages with reified representations of state evolutions and event sourcing (we’ll cover what this means below).

Domain exploration

I find that exploring and modeling a domain is one of the most rewarding software engineering exercises. The essence of each concept must be captured, the interplay between notions must be described, flows must be mapped. And as this understanding then translates directly into a working digital system made of zeroes and ones, there is no room for fuzziness. Everything will ultimately surface, one way or another, sometimes as unpleasant surprises down the road. Dark corners of vagueness left out later can hide the most terrible monsters, usually rearing their end close to the release when the detail level has increased sufficiently to cast light on them. And yet, it’s often impossible to dig deep up front, and as we all know, priorities are constantly shifting, especially in startups. Domain analysis also has an element of intuition because there is no recipe for starting and where to spend the effort. And what’s more, key concepts must preferably be guessed correctly from the onset, as they quickly crystalize into services. Like axes in the model space, they become the pillars of the system, the vectors for evolution, and ultimately translate into product and market opportunities.

Language

When starting my journey at Bestmile, our handy traveler’s guide has often saved me: our internal glossary. We have compiled definitions for terminology that we use daily in code, in meetings, and in discussions. Notions that seem simple at first can hide a surprising amount of detail and often reference other concepts, establishing the map of our field. In our glossary, ride time is, for instance, defined as duration of a passenger’s ride: starts at the end of the pickup (just after the passenger flow finishes), and finishes at the beginning of the dropoff (just before the passenger flow starts). A similar definition exists for passenger flow. This glossary defines our ubiquitous language, as such a specialized vocabulary is named in domain-driven design jargon. I share this belief of DDD practitioners that a high level of shared awareness between domain experts and engineers is necessary to develop a relevant system. At Bestmile, every engineer is a “product owner.” We design and create together with a common vision.

Communication

To get people from various backgrounds organized around deciphering what transportation is about, we have had some success with event storming. Event storming is simply about gathering around a sizeable blank whiteboard and stacks of sticky notes. The session starts with identifying domain events (with a specific color, usually orange). These come quite naturally, as they reflect notable occurrences in real life. You would quickly see events such as BookingAccepted, PickupDone, DriveCompleted, etc., appear. When running out of event ideas, the team shuffles the notes around on the whiteboard and forms logical clusters. These "centers of stickies mass" hint at some shared concept worth finding a name for. This concept becomes another yellow stickie: an aggregate (the term comes from domain-driven design). At this stage, we can also identify the triggers for events. These are commands and also deserve their stickies (blue). More diagram elements can be added as well, but it's surprising how much of the domain and information flows can be revealed just with these three ( event, command, aggregate).

Such event storming sessions are non-technical, and anyone can participate. We have had shared insights emerging from joint sessions with mathematicians, transportation experts, front and back software engineers gathered in one room (before covid — doing this remotely is more challenging). Event storms are by nature message-centric and avoid premature identification of system or service boundaries. As such, they are an excellent fit for getting started on a micro-service architecture. Events and commands often translate one-to-one to their corresponding digital equivalents. Aggregates often turn out to be good candidates for service boundaries (but this should not influence the analysis).

Expression

Once we have identified concepts and found names, we must translate them into code. And as we know, this creative process isn’t unidirectional but highly iterative. As it turns out, sound code often leads to discoveries about the domain, as type-driven development (TyDD) practitioners will have experienced. The most expressive the code is, the more compact the notation, the faster it is to iterate and collect feedback from non-technical audiences.

Precise typing is a great tool to materialize our assumptions: for instance, by defining a duration strictly positive in the code, we impose a constraint on operation logic. The compiler will tell us if a legitimate case leads to a zero duration, thus calling for a more relaxed, non-negative type. This finding can, in turn, challenge our assumptions about the domain and lead to important discoveries. The more precise the type system, the more inference we can muster from the compiler, and as a result, the more insightful the findings. For these reasons and more, we have adopted the Scala language, boosted with Refined and other home-grown libraries for types we often require. We define our data structures with algebraic data types.

📌 Algebraic data types (ADTs) are composite types, i.e., combinations of other types using either a product operation (tuples) or a sum operation (unions). They have many interesting properties: they afford compile-time introspection for mapping code generation, random data generation, serialization automation, or more generally, type class derivation.

Onions

The domain expression is our most valuable asset. We spend a lot of effort nailing names and crafting logic. We need to execute it at some point, but we try our best to shield it from “mundane” implementation details. Like a diamond, we carve the heart of our system carefully, and we certainly do not want any orthogonal impurity to stain it. To help us keep the discipline, we make it hard to contaminate the domain by architecting our codebases in onions. All our codebases have a domain package whose sole dependencies are on extensions of the language, such as Cats (for functional programming) and Refined (already mentioned above, for precise typing).

We use Scala traits to define the algebra we use to express our business logic, our own domain-specific embedded language. Typical elements in this algebra are our entities, such as Booking, Vehicle, etc. We also describe our code's effects with algebraic constructs: we have Repositories (e.g. BookingRepository, VehicleRepository, ...) which are abstractions on the corresponding database tables or actor clusters. Providers represent some external service (e.g. RouteProvider) or other abstracted infrastructure (such as Logger to issue log entries). Our algebra also abstracts encapsulation of result values, error values in particular. Even the very chaining of function calls is abstracted using concepts derived from category theory such as Functor, Applicative, and Monad. These function composition abstractions make it possible to choose later to interpret domain code synchronously, asynchronously, within a transactional context, etc., without touching the domain package. It also makes tests compact and easy to write. Typically, we choose synchronous execution for testing, affording fast feedback, which is comfortable for TDD practice.

📌 We have adopted the so-called tagless-final encoding to achieve this level of abstraction. This particular approach is known as "tagless" because it doesn't rely on the program's reification into values, in contrast with other techniques such as free monads where a full AST of the program is constructed within the host language. And final because the program is interpreted directly into the target monad (versus deferred evaluation). This approach has better performance and is quite simple to scaffold, using Scala implicits and context bounds on the container type.

Embracing failure

Humans tend to dislike errors. Most of us enjoy functional, shiny, and clean machines, maybe even more so engineers. We tend to be control freaks. So we usually minimize the possibility for error. It’s often hard enough to design for the often far-from-trivial happy paths, let alone accounting for all possible deviations. But maybe what we need is a slight change in perspective. By looking at errors as perfectly valid values in our program, and error scenarios as “alternatives,” not degenerate possibilities, we increase the precision and level of control of our code. To do that, we need to get equipped to deal with alternative program flows. Errors should no longer be poor citizens in our programs, such as packed into a single ApplicationError(message: String) exception caught in some untested global handler.

Either

Scala is well equipped from the get-go to deal with errors as first-order values in a functional way. The type is the main tool for avoiding old-school and referential-transparency-breaking exceptions. Either is parametrized with two covariant types, A and B. An Either[A, B] value is literally either an A (in which case it's called a left), or a B (called a right). Notice how the notion of "error" never surfaces here. By convention, the left side is usually used for representing "less desirable" values (aka errors). In more recent Scala versions, this usage pattern was so common that the built-in monadic composition operators of Either( map, flatMap) were made "right-biased": either.flatMap(b => ...) picks the right value - while the left-side operators remain accessible via either.left.flatMap(a => ...).

Even equipped with Either machinery, precise error handling is still a challenge. Usually, our happy paths are well defined, typed, and covered. On the other hand, errors are comparatively vague, often described with few types, if not a single one, with seldom exercised early-failure flows that cut through layers. Embracing failure is hard because it means reaching the same level of precision on the "left" (error) side as we have on the right. We must capture the bestiary of our "sad" domain values with proper ADTs and type hierarchies, with the same level of precision as for "happy" values. Without "taboos" in our value vocabulary, we make it possible for the business logic to account for the broad spectrum of possible unfoldings. For instance, our ride demand optimization service has two categories of errors: HTTP-related, which we wrap into a DelegateServiceError, and InfeasibleDemand. We retry the request several times for the former, while for the latter, we don't as it's a stable result.

As mentioned just above, we tend to discover a myriad of unexpected unhappy flows down the road — partly because they tend to unfold in corner cases, but also because of our own mental bias for right values. Having the same level of equipment and precision in place for capturing alternative scenarios means we can adapt and evolve the system in both dimensions just as easily.

📌 Bifunctor
Recent trends even embrace alternative function values at the code syntax level. There is now some adoption of multi-functorial stacks, as the rising popularity of related projects in the Scala ecosystem attests (ZIO, Monix BIO). In one of our projects, we describe our domain in tagless-final style with a bifunctor, thanks to the help of a small “niche” library (Bfect) which interoperates with Cats.
Either is a good example of a bifunctor: essentially, a container for two mutually exclusive possibilities. 
 As explained earlier, when expressing domain logic in tagless-final style, the concrete functor instance (the actual container type, Either in this case) is not referenced directly but rather via its abstracted abilities (described by the Bifunctor type class). Concretely, this means that as far as the domain algebra is concerned, result values are of a type with two "holes": F[_, _] each type parameter representing one the two mutually exclusive values. 
  Abilities are then indicated using context bounds on this F parameter: compositional characteristics we require (applicative functor for independent computations, monadic functor for sequential computations) and effects such as repository operation, logging, etc. as illustrated in the small code example below.

Bifunctor-driven function. This function involves a couple of the core concepts of our flexible planning system: resources are planning entities, and blocks are time slots of availability for them. In this example, we can spot the context bounds on F: BifunctorMonadError for sequential composition, Bilogger for logging, and required repositories injected in implicit scope. Sequential expressions are bound via a for comprehension, which is syntactic sugar in Scala for successive calls to the monad’s flatMap operator. Notice how logging calls are embedded within the for comprehension, so logging effects are captured into the monad for deferred execution (in contrast to a side-effecting direct call). We later use the onLeft operator defined on the bifunctor to have differentiated logging behavior depending on the error. This precision level is important to us: entries at the error level are turned into operational alerts and are thus carefully selected. We test such domain code with an instance of a writer monad whose values are Either. With this instance, the tests execute synchronously, and log entries are accumulated into the vector, allowing us to check if we do the right logging calls.

Explicit lyrics

We’ve seen above how we use tagless-final to define our domain algebra, our domain-specific language (DSL). We also use more limited-scope DSLs extensively throughout the codebase, for builder patterns in particular. One example that I can share is our application builder DSL, which provides a “guided” experience to compose an Akka cluster application with graceful startup and termination (we describe this more in-depth here).

Cluster application builder DSL

Even more compact and expressive than DSLs are parsable diagrams. Here’s a simple example of this idea, in which small marble diagrams are used to test a relateTo method in our DateTimeInterval type:

Parseable marble diagrams help for concise test case formulation

We use such compact machine-readable notations to represent more complex data, even vehicle plans. Their conciseness is extremely useful for expressing tests, communicating around tricky cases, and even including continuous production logs for troubleshooting intelligence.

Immutability

What’s challenging in software is state. Program flow involving state mutation is hard to follow because it depends on the execution time, not just on the code itself (which is already complicated enough). Not to mention the challenges of running such programs concurrently. It’s much easier to reason about deterministic logic, where one input always leads to the same output. That’s why modern programming tends to push mutations as further away as possible, out of the code and into the database. When it comes to state handling, we’ve observed roughly three families of services:

CRUD (create read update delete) services: these are the simplest, essentially a layer of business logic on top of a database. Any temporary state is local to a request, often also enclosed within transactional boundaries. Each request is independent, there is no shared state outside of the database, and the system is purely reactive, i.e., it only does something upon request.
Streaming services: that’s when data flows in and out, and logic is about combining streams and mapping/reducing the data. Time is often itself a vital parameter of such pipeline-like processing. State can appear, but it’s local to pipeline stages and not shared globally.
Active services: such systems are the most complex and actively support some stateful process or represent dynamic interactions of an entity with the outside. They react to external stimuli but also actively schedule operations and send messages. Internal processes transition through various states and are best described as state machines.

Event sourcing

We end up with the latter category of services when representing what happens in real life for a particular entity. Typical active entities in our cloud are vehicles, bookings, rides, planning units, etc., which are highly stateful in reality: they change over time, according to specific rules. When trying to describe these natural transitions at the domain level, the tried and true method of a finite state machine (FSM) works very well. A state machine is a set of unambiguous states and defined transitions between them. Nothing new there: it’s a construct that ranges back to the origins of computing. But what’s still today not that mundane is to make such little stateful machines persistent and scale their numbers to millions. One answer to this challenge is “event sourcing”, combined with the actor model to implement the FSM. In event sourcing, we store our state machine’s transitions into the database as timestamped entries into a single table (also known as “event journal”). There is no database schema per se — storage is just a linear sequence of events, a log. The sequence of transitions is replayed from the seed to retrieve the latest state (or some intermediary “snapshot” for performance). Retrieving the current state like that is called “recovery.” For instance, for a Booking entity, we will have events like RideStarted, RidePickedUp, RideDroppedOff, RideCompleted (and corresponding transitions in its state machine). An actor represents each booking in the multi-node cluster and is flushed in or out of memory dynamically as needed by recovering from events.

What's great about describing entities with event sourcing is that we never lose anything: we can replay the "movie" indefinitely and in various ways. We can change the state machine so that the same transitions lead us to a different state. We can generate by-products of those events as well, e.g., to populate read-only models for efficient GET-intense APIs or for reporting purposes (such alternative event traversals are called "projections"). Most importantly, we have this historical data at our disposal, and we are equipped to face unforeseen requirements of the business, even retroactively. For more information on event sourcing, we have documented our methodology for domain-driven event sourcing in this article.

📌 If you're familiar with front-end development using Redux, all of this might seem strangely familiar. It's no surprise, as it's essentially the same idea: a single, immutable object contains the application state and can only be affected by a defined set of transitions (actions). All interactions in the app translate into actions, thus affording infinite replay (which can be experienced in the redux developer tools).

Streams and time

As mentioned before, there are services for which time is an important parameter. Reporting is the most obvious example: the vast majority of metrics are time-based. One example of many in transportation would be the number of passengers transported per hour, the number of bookings per minute, etc. We also handle real-time streams of telemetry data upon which we “slide” computation windows, for instance, to produce live averages. But time also has an essential role in subsystems entirely unrelated to reporting. For example, orchestration of application startup and shutdown is highly time-sensitive (see our related article for a detailed exploration of this apparently simple but surprisingly challenging topic). Whenever time becomes a relevant factor, elevating the level of abstraction gives us the means to precisely describe how things should evolve and not awkwardly by anticipating control flow or fiddling with synchronization primitives. We use reactive programming toolkits such as Akka Streams and Monix for orchestrating data streams with a rich set of operators that factor in the time component. Operators allow for buffering, filtering, and combining streams according to certain timely conditions, etc.

📌 In genuine ReactiveX form, Monix even supports “time-traveling” during testing via its scheduler abstraction, so this lets us perform exhaustive checking along the time axis, e.g., to reproduce timeout conditions.

Stateless protocols

Besides orchestrating data streams or capturing transitions with event sourcing, we also sometimes represent the state’s evolutions within the data structures themselves. One example of this methodology is our vehicle communication protocol, which consists essentially of a single bi-directional gRPC stream. Downstream, orchestration intelligence informs the vehicle with a single message about all upcoming actions to be executed within a certain time window. Upstream, the vehicle reports back its aggregated progress. In other words, we use a single message type that describes all that’s needed at any one time, and we track completion in the same way by reporting the full list of status evolutions and when they occurred. Either side emits a message whenever something changes, and any divergence is resolved by giving the orchestration platform preference and refusing reports until the client complies.

This principle sounds simple and quite natural, but we have noticed that it is tempting to split information into distinct message types when facing protocol design. This reflex is probably borne out of the old days when efficient serialization and bandwidth weren’t readily available like they are today and may also be because our natural strategy of solving problems is to decompose them into smaller pieces. The main problem with various message types, primarily when distributed over time, is that it’s easy for ordering requirements to creep in. Sometimes it’s very subtle: over time, the code might anticipate the fact that implementations always deliver a specific message after another one, while this ordering requirement is never stated explicitly in the protocol.

Such implicit reliance on delivery time and message ordering can ultimately turn servers and clients into full-fledged state machines, which are more complex and require persistence. By reifying status evolutions into the data structure itself and making the protocol structurally time-independent, we end up with more robust and more straightforward implementations (albeit at the cost of message size). We call such protocols “stateless” because they don’t rely on external state or timely message ordering. The message is “self-contained”: by reading that single message, the vehicle knows at a glance everything that’s needed to carry out its mission and even its very own completion history. Of course, this makes for a large message whose format is affected every time we add some aspect to the protocol and leads to more complex data structures in the code.

📌 We mentioned above the Redux approach for implementing UIs. We can also draw a parallel to it here, in how redux creators made the somewhat counter-intuitive bet of representing the complete application state within a single object for benefits in simplicity and control over evolutions in the state.

Immutable data structures

Thankfully, we have various tools that help us in managing this complexity. Optics, for instance, allow zooming into large data structures to easily manipulate elements of immutable data no matter how deeply nested they are (I wrote an introduction for optics in Scala here). As mentioned a bit earlier, using algebraic data types gives us access to advanced tooling based on type derivation: since ADT are composite types made of sums and products, generic operations can be derived out of primitives defined for basic types such as String, Int, Boolean, etc. Scala has a powerful type inference engine that we can instrument to "embed" rules in the types and even write programs at the type level. Type-level instructions are evaluated at compilation time (this is known as type-level programming, see Slick and Shapeless for great examples). It also features a powerful macro system built into the language, granting direct access to the AST for even more flexibility. For us, these fancy language features concretely mean that we can cut down on boilerplate a lot and move faster with minimal compromises on safety.

📌 One boilerplate killer for us is Chimney, which relies on metaprogramming to auto-generate data mapping code based on equivalent naming and type correspondence. Automatic type mapping makes it easier to segregate domain types from API data types or protobuf DTOs while limiting the required boilerplate. Thus, mapping logic boils down to describing exceptional cases when the data shape isn't the same or naming is different. 
  For systematic regression testing, we define property tests with Scalacheck using automatic data generation afforded by Scalacheck-shapeless, combined with Circe-golden. For a certain DTO, we can derive random data generators automatically (from a set of primitives defined for string, int, etc.). Multiple variations of these random instances are fed through serializers, generating a "golden" sample for future comparison. We commit these samples to version control for future reference. 
  So for almost zero coding cost, we get automatic verification of backward compatibility. True enough, auto-generated data is generally unsound and only satisfies the message "shape" (unless using this approach with hand-built generators, but this makes the test sensitive to generator code too). This said, Scalacheck primitives are not purely random and will try to explore corner-cases, so as a bonus, you get coverage for empty strings, empty lists, special chars, large numbers, etc.

Scale

source: Faris knight, via Wikipedia Commons (CC)

At some point, this immutable and pure algebraic domain must have some effect on the world. We need to roll up our sleeves and take on the challenge of continuous development and 24/7 operations in the cloud (we have described this process more in-depth here). As stated in the introduction, we have no choice but to keep our system highly available, with no update downtime and automatic fail-over in case of disruption. As our customer base is dynamic and varied and our operational costs matter, we cannot afford individual deployments, and our system is multi-tenant. We break down this complexity into roughly four layers, described below.

Distributed applications

For each service, we have multiple running instances. These instances collaborate to tackle the load in different ways depending on the kind of service. For CRUD-like services with REST endpoints, the load-balancer distributes the incoming requests, as each request is independent. When consuming messages from Kafka, the partitioning scheme splits messages among consumers. In the case of stateful actor-based services, this is vastly more elaborate. We rely on Akka cluster sharding technology to maintain a fair distribution of entities between the nodes (more on this further below).

Deployment orchestration

Kubernetes is in charge of automating deployment and horizontal and vertical scaling of our services. Kubernetes was created by Google and is very popular nowadays. We discuss abstractions in this article, and it turns out Kube is also a great example of domain-driven design in the bounded context of microservice architecture. It has a precise vocabulary of terms such as Node, Pod, ReplicaSet, Service, Volume, etc., which capture the intricacies of orchestrating applications on a cloud infrastructure. The power of Kube also lies in its flexibility for extensions and rich API. We use Helm charts together with Helmfile to drive it.

Configuration and promotion

We stage our releases across various environments, which are isolated using Kube namespaces. What’s essential for us is to keep the configuration reproducible, i.e., spinning up an environment should not involve manual intervention. We follow a GitOps approach, which consists of storing and versioning our helm charts in git and using pull requests in combination with Codefresh pipelines and home-grown automation to move releases forward.

Infrastructure

Infrastructure is the bottom layer, where the stuff actually runs. Our focus here is also on reproducibility, and we manage the configuration of our AWS services declaratively using Terraform.

Split-brain resolver

As mentioned above, for distributed event-sourced services, we run clusters of actors with Akka. Sharding mechanisms distribute actors on the nodes based on an entity identifier and make sure that only a single actor for an entity can exist at any one time. Nodes in the cluster maintain a permanent awareness of others via a gossip protocol. A single node is elected as a coordinator for sharding operations (generally the cluster’s oldest). Messages are routed automatically to the right node using the entity identifier as an address.

Of course, cluster membership is highly dynamic, and nodes often leave and join. In a normal rolling deployment situation, members announce their departure, and Kube upgrades nodes one after the other to maintain availability. Entities running on the departing node are “rehydrated” on another node if new messages come in for them (or if we configure them to always remain in memory). Actors are restored by replaying the persisted events (using the principles described above when discussing event sourcing). Things sometimes go wrong, and nodes crash or get isolated from others. Nodes not responding to heartbeats are ultimately flagged as unreachable, leading to the fail-over of their entities to others. The worst situation occurs when multiple nodes separate from the cluster and a new “rogue” cluster starts forming. This situation is known as a “split-brain,” as illustrated in the figure below:

In such situations, a decision must be made about the “legitimate” side of the cluster, failing which we could end up with multiple instances of the same entity running on different nodes, resulting in state and events corruption. The component in charge of this issue is named the split-brain resolver (SBR), and it supports various strategies for different deployment topologies. Since we host our clusters in Kubernetes, we can use the simple lease strategy, which uses Kube as an external arbitrator. The principle is illustrated in the figure below. It’s like acquiring an external lock or semaphore in the form of a custom Kube resource.

Auto-scaling and multi-tenancy

As we can see, keeping stateful services highly available is no small feat and involves sophisticated technology. Once these capabilities are in place, however, we can reap significant economies of scale. Nodes can be added and removed at will according to load. We can use cheaper short-lived instances, and adding new accounts doesn’t entail any infrastructure manipulation. Dealing with multiple customers’ complexity with various load and usage patterns is much easier to do using application code than infrastructure. It requires careful role design and proper data segregation, but flexibility and operating costs savings make it worth the effort. Funnily enough, we follow the same principles for scaling our platform as what we do for fleets, i.e., efficient utilization via central orchestration to achieve the most synergies from distributed vehicles.

An ecosystem of tools

Scaling is about adding CPU or memory capacity and coping with increasing operational issues and growing expectations. It’s critical to become productive to avoid falling into the trap of constant firefighting, especially with a small team. The way to do this is to focus and get good at some things and avoid repetition. The first steps on our path to micro-services were a lot about taming the beasts that are Kafka, Akka, and even Scala. We have built a small ecosystem of helper libraries and for orthogonal concerns such database migration, geography handling, etc.

We are also very proud to have the author of the excellent Endpoints library Julien Richard-Foy in our team. We’re big fans of algebraic approaches, and his library provides us with the ability to describe REST endpoints and JSON data with a high-level algebra, automating the generation of client and server code and even documentation. It took us some time to put together these abstractions and tools, but we now enjoy this early effort’s many benefits. First of all, most of our codebases have become smaller, more uniform, and if we fix some issue, we upgrade one library, and all projects benefit from the fix. We have reduced the cross-repository duplication by a lot compared to what we had before. Not only that, but it’s now much more comfortable to spin up a new service. By taking the time to put together reusable abstractions, we also better understand each technology’s intricacies. Our tools capture our “cooking recipes” and best practices and provide us with a reliable foundation so that we can dedicate our attention to the real business.

Culture

source: Take My Wife, Sleaze, 20th Television, *The Simpsons* (*fair use*)

Such a discussion would not be complete without some words about the most important and often understated aspect, company and engineering culture in particular. Modern cloud-based software systems are highly elaborate and require continuous care. At Bestmile, we are such a small team that to survive, we need each member highly aware of multiple aspects, starting from the vision to product design to deployment details. That’s only possible if you cultivate a genuinely open and positive culture with empowered employees driven with a motivation to create something unique.

Awareness

It’s much easier to declare that your organization is transparent than achieving it in practice. Real transparency manifests itself in small everyday behavior. For instance, a challenged engineer should have no second thoughts about asking for help. There should be no “dumb questions” in meetings and no need for private chat channels. Engineers should not feel that they need to hide difficulties behind some complicated technical jargon or be compelled to put on some display of know-how. The best results overall come when everyone is on-board and understands what’s at stake.

Flaws and limitations in the system should be acknowledged at all layers, all the way to the top of the corporate hierarchy: they are a challenge to the whole organization, as they influence growth opportunities. It’s often more than “an implementation detail,” even more so at cloud scale. Likewise, sales and business development both fuel and validate engineering, and truth be told, building a great system that isn’t used is dispiriting. In many ways, an organization is like a living organism in that all aspects are connected. From Conway’s law, we also know that team organization and communication structure choices have a direct impact on technical architecture.

In a nutshell, it’s about combining ambition and humility throughout the organization. A successful startup should aim to be the best in its field and accept that this objective is difficult and can only be achieved as a well-informed and united team. And not forget to enjoy the journey along the way as creativity blooms in positive environments, or at least, it doesn’t occur when people are afraid.

Excellence

Positiveness naturally leads me to the second point, the pursuit of excellence. Excellence relates to cultivating a positive mindset about work. I think it’s fair to say that for most of us, the less enjoyable part of being a software engineer is to regularly have to dive into some incomprehensible codebase to solve an obscure bug. Not much is gained from that: vast amounts of time scanning the code, a frustrating trial and error process, disappointing results. And yet, I’m always surprised how often we paint ourselves into this corner. We generally blame external factors for it: it’s only natural, it feels awful! However, as much as we like to complain, it’s not the manager’s fault if the code is terrible: he’s not the one who wrote it. I feel that it’s more a lack of communication and guidelines that leads to harmful code. Engineers often struggle to explain how difficult certain things are to an arguably poorly receptive audience. As a result, they feel compelled to cut corners: “it’s been done before, I’ll do it just one more time.”

The best way to ensure a productive developer life and sustainable code is to make it structural, make it part of the culture. As mentioned earlier, management buy-in is critical. By entrusting engineers and involving them from elaborating the vision to product design to coding and deployment, we allow for accountability. When facing being woken up at night due to some failure, the incentive to resist short-term hacks and build something solid is strong. On a more optimistic tone, pride from good work and success creates a positive feedback loop.

In my experience, creating something significant requires excellence as a mindset. You can’t build a cathedral from broken bricks and duck tape. Piece by piece, everything must be the best you can find or achieve: from the materials to the building tools and techniques to the actual construction. When undertaking something never attempted before, it’s tough to know in advance what will turn out significant in the long run. The fewer broken bits accumulate during development, the more leeway to make the right choices along the way. And the more people use cutting-edge tools and techniques, the more skilled and productive they become with them.

A shared dream

I believe that an inspired and skilled engineering team with a mission to create something meaningful can achieve a staggering amount. At Bestmile, we’ve created a language to describe the future of mobility, and we’ve elevated it into a highly available and infinitely scalable cloud of microservices. We follow our dream of optimizing transportation with our lines of code. For us, that’s a cause worth giving our best.

Update - 2022: Bestmile has been acquired by ZF and the journey continues at a larger scale, our system is now a pillar of the Scalar orchestration platform.