From Monolith to Microservices

Kelsey Kerr
Arthur Engineering
Published in
6 min readDec 11, 2020

Speed is the name of the game at startups. At Arthur, we’re moving fast and iterating quickly to build the best model monitoring platform on the market, and that’s why we embarked on the journey to migrate our original monolith into microservices. In the early days of Arthur, when the market was nascent and the engineering team consisted of just a couple of people (and the CEO putting on his developer hat), a monolith made sense. Starting with microservices would have added additional complexity, would have gone against the YAGNI mindset, and likely would have slowed the small, but mighty team down.

“Don’t be afraid of building a monolith that you will discard, particularly if a monolith can get you to market quickly.” — Martin Fowler

As 2020 rolled on, the market for Arthur’s product opened up, the engineering team more than doubled in size, and we were faced with a tough question: do we continue on with the monolith knowing that we are likely punting on scaling issues, or do we take the time to slow down now and rethink our architecture in order to speed up later?

simplified Arthur architecture from the early days

How we knew it was time to make the switch

As the title suggests, we decided that it was time to invest in an ecosystem of microservices. It was clear that we had achieved product market fit and we would have to scale to fit the demands of multiple clients and that our internal development speed would likely benefit from decoupled services. For example, we had great test coverage for our monolith, but the service was so big that our very thorough unit and integration tests would take over 20 minutes to run if we weren’t running them in parallel. We also knew that a highly available platform was non-negotiable, but with a monolith, that meant scaling up to meet the demand of a couple of our frequently hit data ingestion endpoints while the majority of endpoints had a fraction of the throughput and memory usage. Other motivating factors for switching to microservices were the ability to better parallelize work amongst the team and make it quicker and easier to add/edit/remove features. We were confident that microservices were the way to go, however we knew that it would be a large undertaking and that there was the infamous Microservice Premium.

How we got started with microservices

We started with just a few microservices, as we read that knowing when to create a new microservice is as much of an art as it is a science. With this in mind, we erred on the side of caution and only split out the obvious pieces first. We decided to keep our user, authentication, model metadata, alerts, and tenant management logic in a service and split out the following pieces into separate microservices:

  • inference ingestion
  • inference & metric retrieval

Inference ingestion was an obvious candidate for a new service because it could be easily decoupled from the rest of the platform, and it was something that we knew would have to scale horizontally to meet customer data load. We designed inference ingestion as a lightweight service that makes use of Kafka to ingest both streaming data and batch data. From here, it naturally made sense to create an additional service that was responsible for retrieving inferences and calculating metrics on these inferences. It was important to consider performance here and we knew that our choice of data store was crucial to our ability to serve up insights quickly. With this in mind, we were able to leverage the expertise of our data engineers to choose the right data store for our inferences and write the microservice that was responsible for retrieving inferences and calculating metrics. While our data engineers worked to build these microservices, the remaining backend engineers began tackling API design and the remaining product work. After designing our APIs and getting through the implementation of the user, authentication, metadata, alerts, and tenant management, it became evident that the alerting part of our platform was a good candidate for its own microservice. Below you’ll find a simplified diagram of the architecture we arrived at.

simplified current Arthur architecture

A good excuse to rethink some other things

Knowing that we were going to have to invest a lot of time into our new architecture, we also saw this as an opportunity to reflect on other decisions — particularly our backend language. Our monolith was written in Python, as it is the language of data science and machine learning, and it’s one of the most beloved languages among software developers. For these reasons, Arthur will always have some parts of our product written in Python, like our SDK. Internally, we still love Python, but we had some performance and scaling concerns with our Gunicorn + Flask app and we felt a strongly typed language better lent itself to our API first approach. Being a cutting edge startup, we decided to use Java! ……kidding, of course. While I’m a Java fan, it’s widely known that many developers, especially those who are inclined to work at startups, dislike the language. Java did check a couple of our boxes in that it’s strongly typed and fast. Knowing that we wanted these things but also wanted to use a language that developers are excited about, we decided to use Go, as it is incredibly fast, well liked within the developer community, and Go routines are baller.

The beginning of the journey was pretty arduous. Transitioning from a monolith to microservices is hard enough, but we also had to get used to using Go. Many of us came from object-oriented backgrounds, so we felt a little lonely without classes and inheritance and often wondered why the heck there wasn’t a ternary operator. After some wrestling with pointers and error handling, we did start to see the light of day and noticed that we were cranking out new endpoints and features quickly, our tests were incredibly fast and Go routines made concurrency a breeze. While we have embraced Go, we also recognize that the ability to choose the right language for the job is one of the reasons we chose microservices, so that may mean we adopt other languages in the future.

It wasn’t easy, but it was worth it.

Along this journey, we definitely experienced some of the pain points that come with adopting microservices. It was clear from the beginning that we’d be introducing additional complexity to our DevOps processes and we’d need to figure out how to create pipelines that would allow us to independently deploy each microservice. This required everyone to quickly get ramped up with Kubernetes and within a couple of months everyone had experience updating Helm Charts and deploying new services. Some of the other pain points we ran into were issues with observability, logging, and being able to test services independently from each other. We responded to these by improving our logging, both in the log content and the log accessibility via Kibana, and writing our external client interfaces in a way that allowed us to add mock clients that can be used when running and testing the services locally. We continue to encourage developers to speak up about pain points and areas for improvement.

While the big task of breaking up our monolith into microservices is behind us, we know that the process of constant improvement and iteration will continue. So far we’ve already seen how microservices allow us to scale both the platform and the team by allowing us to scale services independently and making it easy to parallelize development work. We’re proud of the architecture investments we’ve made and we’re confident that the platform will allow us to grow and scale to meet Arthur’s growth. The things that we love about microservices are also the things we love about the engineering culture at Arthur — we’re able to move fast and iterate quickly.

--

--