At Botmetric, we are on a mission to simplify the cloud management operations for businesses across the world. As a born in the cloud company, our development process from day one was to be agile and lean for faster iterations with constant feedback from our customers. However, as we built our SaaS platform and delivered our initial versions of the product, we encountered many challenges on process, architecture, deployment, and reliability.
- Growing Complexity — As an evolving SaaS platform, we were rapidly adding support for new use cases that increases the overall complexity of our core platform. The code base grew over last 24 months, and we needed to look for new ways to decouple the complexity. In a nutshell, Botmetric SaaS Platform has three major applications and each with three to five major components.
- Sprint based agile development process with 15+ engineers — Our sprint based development process of two-week cycles had a challenge due to tight integrations required between different components and service layers. We wanted to create a focussed team of 3 to 5 engineers on each of our major components and completely decouple them from rest of the system complexity.
- Automated Deployment Challenges — Even though Botmetric was always built as a decoupled system using RESTful services approach, we had a tight integration among Play based web layer, Java API & services layer, Spring-based data management, Search store, Tasks scheduling, and workers system. As we moved to smaller groups of focused engineers on specific components or applications, we should be able to deploy them without worrying about anything else.
- Diverse technology stack and dependencies hell — We use Java, Scala, Python & Go for building the Botmetric SaaS Platform. Packaging our different components and deploying them on shared machines at times was very challenging so we needed a better way of managing this.
- Component Rollbacks and Library Upgrades — We realized the need for upgrading underlying libraries and rollback different components in a smooth way without breaking existing things when needed. This required us to solve the above problem along with a truly seamless way of upgrading existing services across the platform.
During the start of 2016, we evaluated Docker and experimented it for packaging and deploying Botmetric SaaS platform components. However, during the process, we realized our core problem (apart from packing dependencies) was of tighter integration between different components and applications that had to be decoupled for reducing the complexity.
How we moved to micro services and why we love it?
- Decoupled Botmetric into manageable microservices — We have undertaken an exercise to completely decouple Botmetric system into small manageable units like Web, Search, Data Analyser, Data Extractor, Scheduler, Task Automator, Notification Engine & Auth Engine, etc. This helped us to have focussed engineers on the micro services development, management and evolution.
- Service discovery with Consul — We have zeroed on consul after evaluating multiple options for our API services, backend services discovery along with configuration and orchestration. Instead of tightly integrating different public and private RESTfull API services within Botmetric across different components, we have moved the complete discovery and configuration into Consul.
- Containerization with Docker — We developed and deployed specific docker files for managing the dependencies of different components so we can develop and deploy them independently without using one or two bloated release packages. The docker images are hosted and managed in EC2 Container Registry provided by AWS.
- Microservices cluster management with Nomad — Our common infrastructure fleet in AWS was used to schedule and deploy all the micro services using Nomad. It simplifies the replacement of underlying VMs and controls the workflow for deploying our micro services at scale without breaking our head.
- Components release packaging with Jenkins — Our CI & CD process with Jenkins was modified to support the micro services deployment with being able to create release packages that can be used by Nomad. We currently have around 20 micro services in the Botmetric platform.
During this journey, we have learned that adoption of new services and architectural approaches does bring its own set of challenges including initial hiccups and bringing everyone up-to speed within the team.
We have simplified our complex system into manageable components during the adoption of micro services architecture along with containerization to speed up our development and as well deployment of software.
Our focussed team of 3 to 5 engineers are now responsible for creating, managing and evolving the specific micro services without having to worry about the overall system complexity. This helps us in the simplified sprint planning and development of Botmetric SaaS Platform offering Cost Management & Governance, Security and Compliance, Ops and Automation applications to our end customers.