Docker and Microservices — Why do they make us better computer science engineers? — Part 1

12 min readJul 15, 2016

Its 2016. And we cannot not say these two words. Docker, Docker, Docker! Microservices, Microservices, Microservice!

There, I said it. My due for the year is finally officially over.

Now let’s talk more about the buzz. Docker and Microservices, they both have suddenly taken the industry by storm, and every other conference or meetup has a number of talks sincerely dedicated to these two topics. Wonder why is that? Because they are really important, they are the next deal. Well, you can survive without those two for sure for whatever number of years you want to survive and still create history, a dent even. But having Docker and Microservices by your side, its like having Optimus Prime with the Transformers and IronMan with the rest of the Avengers by your side and you will be so powerful and strong that problems will remain at bay.

Well, however superhero wannabe that sounds like, its true but also it continues to remain true until it doesn’t. And that is what we will discuss in some detail here.

Docker and Microservices are great but it also comes with their unique set of requirements, dedication, hard work and sometimes tearing-your-hairs-apart situation too. We will try to go in and out of both of them today. And try to see, why do they make us better computer science engineers?

Docker and Containers

Good news fellows, specially the ops engineers, your sorry days are going to get better if not the best. Docker tries to ensure that.

Before we start talking about Docker, let’s talk about containers in general. Containers? Hmmmm…Let’s come back to it later. Before containers, let’s discuss about Virtual Machines (good old VMs). What do we know about them?

Well, both Containers and VMs are similar in their goals, they isolate applications and their dependencies into a self-contained unit that can run anywhere.

VMs are machine based virtualization technology, it emulates a real machine, hence the name “virtual machine”. VMs run on top of a physical machine using a “hypervisor”. A hypervisor, in turn, runs on either a host machine or on “bare-metal”. VMs which allow you to run many full fledge Operating Systems where you can run your applications, install software and do whatever you want exactly like you do on your present OS. The “hypervisor” running either on a “host-machine” or on “bare-metal” which provides the VMs which run on top of the hypervisors access to host-machine’s or bare-metals’ system resources like CPU, RAM, Storage etc.

You can run multiple VMs having different OSs and applications running on the VMs all that on a single host. The resources of the host thus get distributed among the VMs. So if one VM is running a more resource heavy application, you might allocate more resources to that one than the other VMs running on the same host machine. The catch is, its not a real machine, its a virtual machine. It means, VMs let the applications and processes running inside it, feel that they are running in an isolated system which has full access to the CPUs, the RAMs, the Storage as if the VM is the only machine. But the magic is, multiple VMs can run on a single host using bare metal hypervisors or host machine based hypervisors.

So now you got full control of your applications isolated in different VMs and each of them acts as standalone. Well, it sounds nice until it doesn’t. The cost of creating, provisioning, configuring multiple VMs in any environment be it development, staging, testing or production is high and painful and also alarmingly slow.

History of Containers

Here comes the concept of “Containers”. There has been gradual development in technologies which later came to be called as containers. Initially it started with FreeBSD’s jail functionality where a system could be partitioned into different smaller and controlled systems called jails. Jails used another Linux functionality called chroot which helped to give the process running in the jail its own isolated root directory and thus can access anything and everything inside this controlled and isolated directory in and below it. Every jail process had their own root directory which were isolated but these chrooted jails were not much secure and didn’t always remain isolated.

Then came cgroups (controlled groups), which allowed controlled and limited access to system resources and usage for these isolated chrooted processes which helped in increasing the security concerns which were present earlier. Then came user namespaces in linux which allowed users and groups have access to few resources inside the container which were not present outside the isolated processes.

Then came LXC to the rescue which added many required tool, libraries, security to the earlier attempts at containers and hence a good containerization system started to come into being.

Containers are also called operating system level virtualization as compared to VMs which are called machine level virtualization. Operating-system-level virtualization is a server virtualization method in which the kernel of an operating system allows the existence of multiple isolated user-space instances, instead of just one.

The containers don’t need to start a full blown VM which is costly because you have to start a VM, install an operation system on the VM, configure the VM etc. but with containerization you just create isolated process spaces which share the same kernel along with other containers and creating, starting, stopping, running these containers are faster by many folds.

Still, LXC was complex to work with and there were many things which had to be done manually and via scripts. Then came Docker which changed the entire containerization space. Although similar techniques existed before, but docker made working with containers damn easy. Its like a few basic commands and you are creating 1000s of containers in the next moment. So the early adoption and ease of usability of docker is the main reason why it rose to the top.

Why to use Docker?

Docker encapsulates your application dependencies into Docker Images and gives you the parity between development, staging, testing, and production environment. It means you just have to build a docker image once and you can use that across all the environments across any machine with close to zero changes.

If not using Docker, then you have to install the application dependencies every time on all the different machines where you want to run your application, which may have different operating systems, different system architecture, etc. But by using docker you just have to care about one dependency that is docker and everything else is already built into the image.

This is the main and topmost pitch of Docker. Build, ship, test and deploy.

Microservices — its time to say adios to our old friend, the Monolith

Lets discuss a little about Microservices now. But before that lets do a recap or memory brush-up and understand if microservices is a system design/architecture pattern, then what have we been upto all these years?

So, all these years, you and I, all have been doing Monolith. Monolith is a design pattern/architecture style where there is a large codebase of our project or application. This large codebase has all the data, logic, client codes, etc. present in a single codebase. Often monoliths are created when we use an MVC framework which ties our project to all the models, views, controllers in a single codebase. This is okay. And we have been successful in making monoliths work since many years and we have seen some giant companies reaching their golden periods using a monolith design.

If you are still building monolithic web app in 2016, you need to fix it,
or you’ll be fired soon.

But sometimes, if something is successful doesn’t mean its the best we can do. So great minds started talking about the problems monoliths brought and slowly they started talking about coming up with a solution, some way to overcome the shortcomings of monolith, and then came the talks about microservices. It was like a virus (A good virus) which slowly and steadily started spreading in the industry and industry experts and awesome engineers started accepting the virus and its agenda and they adopted the virus. The virus kept promoting itself via their hosts, and kept multiplying and here we are in 2016, and everybody is just talking about microservices. Its like the buzz word of the town. Its the next swarm after what the “SQL vs NoSQL” database debate brought. The industry welcomed NoSQL with wide open arms. Similar responses have been noticed for Microservices too.

What were the problems of a monolith?

Whenever a small section of the code base changed, the entire application needed to be repackaged and redeployed. Lot of overhead for minor changes. Moreover, if a small service in the monolith started receiving more requests, then the entire application needed to be scaled horizontally or vertically, even if other services don’t require that much resources, hence wastage of resources and money. For doing a small work, engineers need to wander about the entire codebase to search for files where to edit and keep everything in sync, not developer friendly.

How microservices help solve the above problems? Or does it create new problems?

You are certainly getting good at asking questions. Microservices solve a lot of problems monolith have but at the same time it also brings its own set of problems. Microservices have a lot of baggage that comes with it and shifting to microservices needs to planned ahead of time and if you are trying to restructure/re-architect your codebase into smaller microservices, the job is not that easy. It requires hard work, and a lot of discussions among team members and across teams to decide on a lot of factors starting from services’ ownerships and responsibility, the granularity of each service, the contracts between services. Its a considerable amount of work and dedication that is required to move to a microservices architecture.

Let’s delve deep into microservices and may be things will get clearer as to why microservices is boon and also a headache.

Microservices are not some magical lego blocks which you get readily available. It has to be built from ground up. Here are the things which constitute an entire microservices architecture based system.

Deployment Pattern - How will you decompose or divide your services and how many services will you run in each VM and how will you run those services, and do you want to have replicas of your services, like when you do horizontal scaling. These are questions you should ask yourself. Things will get clearer eventually.
Communication - How will the services communicate with each other? All this while, in a monolith, the communication was limited between modules talking to each other via function calls, but now services are isolated processes and sometimes they even live in completely separate physical hosts/VMs. Options like Remote Procedure Calls (RPCs) using protocols like Thrift, gRPC or using plain Rest Api calls over http/https. You can also use message bus with a publish/subscribe pattern to communicate between services. Kafka and RabbitMQ are great options for the message passing method via queues or brokers.
API Gateway - How will clients talk to so many services, in fact sometimes services itself need to talk to multiple other services which again talk to several others. Its a chain of requests. Clients of the entire system or the application (not the services) cannot know the addresses of all the services individually. Its better to have a single API gateway which is the talking point for the clients. The api gateway processing the api endpoints and proxies the request to the corresponding service.
Service Discovery - Even when api gateway is a single service itself which is the only service exposed to the clients, what about the services the api gateway talks to? Also, what about the services individual services talk to and it can be multiple and as mentioned in the above paragraph it creates a chain of requests. The solution is note down the addresses of all the services in a single text file or a database and let each service do some query like dns to retrieve the address of the service they seek. The problem here is scalability issues. Its not a good solution since services can go from 5 to 5000 and sometimes services keep failing and they restart on a different machines with different IP addresses. This creates a problem. The database will have to keep updating, adding, deleting addresses constantly plus it has to be fault tolerant and replicate itself since it is the only way to talk to other services, Hence a tailor-made service discovery along with load balancing of requests to services having multiple replicas is to be built. Consul, etcd, zookeeper are good contenders to implement a service discovery mechanism. they are distributed database with fault tolerant and replication features. AWS’s ELB also has load balancing along with service discovery built in together.
Service Registration - Service discovery helps in discovery after the services have added to the service discovery system. But what happens when new services come up, what happens when services die and replaced by new ones. Service registration is the process by which services will register themselves when the start with the service discovery and deregister themselves when to die or stop. This can be added to the services’ code to talk to the service discovery system but then it means each service has to maintain their own separate service registration code. Not the best of choices. Third party registration can be used to let the registrar listen for service start/stop/error/fail related events and add/remove those services to/from the service discovery system. Most of the times, service discovery is expected to cover both discovery and registration. Registrar is a good software which does the work of 3rd party service registration.
Storage and Persistence - All services should be stateless. This is the base requirement of the 12 factor app principal. But sometimes, services need to store data, session information, cache, for that a database is needed. The question now arises to should you keep one database and let all services use them or have separate database per service. One database solution couples the services together and hence breaks the benefit of decoupling responsibilities by using microservices at the first place. Hence separate database per service is the good way to go. Sometimes one database transaction need to trigger another database transaction in some other database which is separated via some service, these triggers need to be well managed so as to not go down the rabbit hole of distributed transaction which a distributed systems problem with not so pretty results. Try event driven architecture to handle this kind of queries. Also look into CQRS.
Monitoring - Monitoring, be it log monitoring across services or metrics/system/application monitoring across services, becomes very very important in case of microservices. Since the only way you know if the services are interacting with each other properly or the way this huge number of services are managed is via monitoring. Prometheus, cAdvisor, Grafana are good softwares that can help you in metrics monitoring, ELK stack or the EFK stack will help you in log monitoring.
Testing Microservices - Not a piece of cake. Unit tests are easy as it deals with small microservices and are faster. Problems come with integration testing, end to end testing. Several methodologies have been mentioned by martin fowler in this article. He talks about component testing, contract testing, and how this can help reach a good testing confidence in case of microservices.

Thats been long, lets take a break. We now have some fair idea about what and why of Docker and Microservices. In the next part, we will talk about why both of them make us good computer science engineers? What is different that monolith doesn’t bring to the table.

Click here for part 2.