🇵🇸 Cattles vs Kittens 🇵🇸

Published in

ServiceRocket Engineering

6 min readFeb 6, 2016

Getting to know Cattles and Kittens

What is a “Kitten”?

I like kittens. They’re cute and funny. Every single cats are different. No cats are the same. Every one of these cute little kittens are their own names, get stroked everyday, have special foods and behaviours. Without special attention to their need, they will die and we’ll cry over their death. If you never stumble into these creatures, attached is some the cute kittens found from the internet (I like the maine coon breed the most):

What is a “Cattle”?

Cattle is domesticated creatures to satisfy our human need (unless you’re a vegan). They produce milk which I can drink at night and other dairy products like cheese and yogurts. We turn them into livestock for meat. We even have names for different cuts; t-bone, rump, ribeye, etc. So I like cattles too, but not the same love I have for cats.

Usually, the way we deal with cattles are not the same we deal with kittens. We don’t give them names although we tag them using some serial number taxonomy. Cattles don’t usually have special behaviours. They are not pets like Kittens.

So what happen if you treats Kittens as Cattles?

Kitten herder is not the easiest job in the world!!!

Moving from Kitten-Computing to Cattle-Computing

Kitten-Computing

We have been doing a lot of Kitten-computing now. We have different type of kittens in some of our product line. Learndot server is currently hosted on enterprise-grade bare metal solution and managed by a third party vendor. They will help to spin up virtual machines for our use cases. The fact that we have names for these different servers is a testament on how precious the infrastructure to us. For eg, we have kaishin as our devops/build/repo VM, yardrat as our staging VM and vegeta as our production VM.

So, when a critical upgrade need to be applied to the hardware (or even virtualized hardware), it is like giving some temporary anaesthetics to the cute little kittens for a operation. Kids will be unhappy if they cannot play with the kittens. Same with us, if learndot.com is downed for couple of hours, people will be unhappy about it.

The fact that resources and infrastructures are so precious, is not necessarily good for our customers. They need to be designed for failure. They need to be disposable as needed. That’s where are we going: “Cattle-computing”.

Cattle-Computing

It is true that over the years, we have significantly improved and we have been managing our infrastructure quite well. Instead of manually bashing the keyboard to fix stuff, we evolved by managing the server states using tools like Puppets. We evolved further to isolate the infrastructure and application using Docker containerization. That increases our velocity and releases, and reduce the downtimes. It is a testament that we are willing to adapt to this rapid changing landscape of technology madness. My point is, we need to further move forward.

We cannot afford to keep doing Kitten-computing. There are acceptable limits on the amount of kittens that we can handle. Otherwise, we’ll be hoarding those cute kittens. And they will have fifty diseases and no longer cute and die and we’ll be sad for their death.

In saying that, we want to shift our mindset to manage a herd of cattles instead. We will manage lots of cattles for what they provide us. With proper breeding strategy, they are disposable and highly available. If there’s a new cattle breed in the market, we can use it straightaway and mix them in the herd.

Cattle-computing is synonymous with cloud computing because cloud computing abstract away concepts like servers. For most of the stuff, developers will not be thinking about servers, but they will be thinking about services. Instead of thinking about servers, think compute units (in which different vendors may have different abstractions/implementations of a compute unit). Instead of thinking about physical provisioning of networking and access control, think of the network model provided by the vendor. For example, with AWS you’ll have extensive support for different networking needs, which can be configured from the UI console, API calls or scripting-based automations. There are other vendors’ implementations, for eg flannel by CoreOS or OpenStack. Recently (like couple of years ago), AWS also introduced new concept of compute unit. Instead of abstracting server as a compute unit (in which you’ll need to still have dedicated OS etc), they further abstract away servers by introducing Lambda, remote function executed on the cloud. One of our engineering team actually had a good play with Lambda technology couple of weeks ago.

My point here is: while traditional infrastructure is expensive and long-lived, newer infrastructure is designed to be cheap, disposable and short live with fail-first design.

Culture Shift

All of these matters a lot. From what I heard and read and understand, it involves a culture shift for our organization as a whole. If you read the blogpost on Update on Unified Engineering, shifting our mindset to Cattle-computing is very much needed especially when we want to make our product and offerings more elastic to the demand. It promotes the fullstack development culture and breaking down the knowledge silos in engineering.

Design for Failure

So as a developer, you should adapt failures in delivering your solution and develop for failures. If you really think about it, software applications running at some remote backend system are really fragile now matter how optimised your code it. There is always a limit to backend load. Ultimately, you have to accept that the compute unit may not be able to handle it if there’s a sudden surge of demand.

While we can enforce constraints to mitigate and limit the demand to use the application, that will not solve the problems. Kitten-computing typically means that you will have downtime to your service because you need to scale up your servers. Put more RAMS, improve the disk IOPS and storage or add more cores to your CPU.

Cattle-computing is a bit different. If a compute unit can’t handle the loads, spin up more compute units. Ideally, this happens automatically by monitoring the load and elastically spin up more resources. If a compute unit is irresponsive due to DDOS situation, kill it and drain the HTTP connection to other healthy instances under a load balancing setup.

Similar design should be taken when writing codes. If you can sacrifice some consistency in your code for more “AP” kind of system, you should do it. For example, typical image processing to resize image attachments for contents does not need to be consistent at all. In other words, when saving the contents, your code should never need to process all the uploaded images and wait for them to finish. Just fire and forget. If you can’t show the resized image now, maybe show it later or show the originals. Be a proactive system for this case. Preprocess the images before the user demands it. That way, you are able to handle failures before users demand for it.

Design for High Availability

Again, taking Cattle-Kitten analogy. Your pet kitten will be with you, always, regardless if you move from Perth to Texas. But you do not have to carry a herd of cattles from your Perth farm to a different farm in Texas. You can sell them or slaughter them for profit. Cattles are disposable, that what makes them highly available.

More “traditional” application tends to reside comfortably on certain infrastructure or servers. They are not designed to be easily moved. So, what people tends to do is to allocate high amount of resources to the server and really take a good care of the server.

Going forward, we want to design for high availability. Treat a compute unit as a disposable unit. Consider a microservice to handle billing. On a typical microservice setup, you will have at least 3 small instances as opposed to a bigger monolith instance. If you are expecting more loads, provision more instances. Scale out, don’t just scale up. If any of the instances is downed, kill it and route future connection to other alive instances via a load balancer. If you need to do rolling updates for new version of the application, shut down one of the instances and wait for connection to drain. Then spin up a new instance with new version of the application. Keep doing that for the rest of the instances. Yes, it does sound complicated, hence, we got to pick the right technology stack to help up refine this process. With proper cloud computing platform, this process will typically abstract away by the platform itself.