Two years into microservices

How we dealt with some of the downsides

When I started at my present company, two years ago, I already had a lot of curiosity about microservices.

Working with ruby (mostly rails) in most of my previous works, comprising at least three large monolithic rails apps, I was eager to use microservices to tackle some of the biggest problems I encountered in them.
Problems like:
— The danger of breaking something when making little changes and not perceiving it until it was to late;
— Avoiding updating or adding features due to the complexity off the app;
— Having to scale the whole app when the bottlenecks were a small subset of the feature;
— Having to use the same stack even when tackling specific problems that would greatly benefit by using a different one.

Although I had this urge to start with microservices I didn’t find many documentation available and most of the one that was available was only theoretical and not based in actual production usage evidence.
Also this was a startup, we were a team of four developers and at the time none of us had any experience with microservices.
As for most startups, although our project vision was for a huge platform, we first needed to prove product-market fit.

Our decision was to go with the monolith rails app.

Along the way and after the application usage started to grow, in small steps, when the features we were adding were easy to isolate, we started to play with microservices. Testing and trying to learn with the outcomes.

Two years forward we are now extracting all the monolith app into microservices.

Which problems did we encountered and how did we tackled them?

DevOps overhead

  • As everyone says, the increase in DevOps related tasks is very real, having to deal with a large number of instances is hard and time consuming, even when automating most of the tasks.
  • Our solution: Have a dedicated person or team just for this role. When you go into the dozens of microservices we see no way around this.
    Automate as much as you can. We use ansible for most of our tasks, from provisioning to backups to configuration management.

Inconsistencies between codebases

  • As we experimented with different frameworks for our microservices, often different developers where developing with different frameworks at the same time. It became obvious that this inconsistency would be a pain to deal with in the future.
    Although we knew that one of the advantages of microservices was the possibility of using different stacks to deal with specific problems, most of the times you don’t need to change the stack.
  • Our solution: We developed a lightweight framework (grape based) that would is used as a base for all the microservices (exceptions allowed when justified by the problem at hand).
    This way, someone familiar with the base, knows how to navigate any of our microservices code structure.
    This base brings already baked in:
    - Request authentication middleware;
    - Cache layer middleware;
    - API documentation generator.

Dispersion of concerns

  • When dealing with a platform comprised of dozens of microservices it isn’t easy to know which does what (although really good naming helps a lot).
    If our product team asks for a specific feature it should be mostly clear where to add it and which resources are already made available by the existing microservices. This should not require any team member to examine each one of the microservices code.
  • Our solution: Centralised documentation generated with small to no decoration effort in our microservices.
    Using swagger as base. Having the premisse that everything coming in and out of the microservices can only use one of two doors, a rest API or a Message Bus (RabbitMQ). We developed a way for each microservice to generate a json file. This file describes each of the end-points and all the messages sent to and consumed from the message bus. This json files are then consumed by a centralised documentation microservice (also swagger based) that presents them in an html interface and that can be used to query the endpoints and view their responses.

Rest API is not enough

  • Besides the regular synchronous communication, some times we had to assure message delivery between services or we had to notify multiple services of the same change and http was not the best way to handle this.
  • Our solution: We went with a hybrid approach, we use http requests for most of the communication but when we need to assure delivery or we want to push notifications into the ecosystem we use a message bus implemented on top of RabbitMQ.

Single point of failure

  • At first as we were testing, we used an hybrid approach. We had our user facing apps, monolith rails apps and this apps would interact with the microservices. So our frontend would always communicate with the backend and the backend with the microservices.
    As we moved into having a lot more microservices this was a problem.
    We had split microservices as a way to scale better and be more reliable (single component failures versus all app downtime) but if our main app collapsed everything went down with it.
    This approach had the benefit of having a single point of entry into the ecosystem, making it easier security wise. We only needed to secure the entry point, the rest of the system was inside a private network.
  • Our solution: Make the fronted interact directly with the microservices, all our services are now first-class, the client side app makes requests directly into the microservices (we are now using angular in the front). This way if one of the microservices fails the rest of the app keeps working as intended (not 100% true because we still have some inter-service dependencies).
    We still have to serve the frontend from somewhere and that’s still a single point of failure, but if your frontend is only comprised of static cacheable assets you can use a CDN to distribute it eliminating this problem.
    Other benefit is the easier parallelisation of requests from the frontend.

Securing the communication between microservices

  • As mentioned in the previous bullet, while promoting the microservices to first class members of our system meant that they now must be open to the exterior, receiving requests directly from the client side.
    This brought a security challenge, all the requests coming into our microservices needed to be validated.
  • Our solution: JSON web token (JWT), besides always using https.
    With JWT we can transfer information like the id of the user, making the request using a json object hashed using a shared secret.

Communication overload

  • This type of fractioned architecture really increases the number of requests made, which can translate to perceived slowness by the end user. This is mostly due to the latency of the requests between services and the fronted.
  • Our solution: We approach this in two ways.
    First and most obvious for us, we built a cache layer into each one of the microservices, this wouldn’t reduce the number of requests but it can, in most of the times, increase the speed each one is answered.
    Secondly we used local storage in the client side where we put the information that is repeated in most of our application screens, this storage helps us reduce the number of requests by a significant amount.

Maximising library reutilisation

  • Other thing we found after having increased the number of microservices and the size of the development team (3 to 10 developers) was that different developers in different microservices were choosing different libraries (ruby gems in our case) for the same purpose.
    Having two libraries for the same purpose is also having twice the probability of something breaking and needing to bring into the team twice the knowledge.
  • Our solution: We use our documentation module (referred in a previous bullet), and every gem added to the project must be documented there. Documentation must have info about who added the gem, what’s the purpose and in which date. This helps us search there before adding any new external gem.

Hopefully this information is helpful for someone going trough the same problems.

Also if you want to discuss some of this approaches in more details, I’m available on twitter.