I was a little surprised by the tone of this article and what seemed like it’s optimistic view of the state of the practice in sizing applications in containers. I think it’s easy to see that sizing an application for a container or VM is harder than sizing an application for a host, because of the additional layer of indirection that occurs. About half of my work experience is short term consulting work on applications that don’t scale or don’t perform acceptably. Even when this work is for well-funded customers with high end physical hardware it’s challenging. The state of the practice in terms of defining resource requirements for applications is very crude. The state of the practice in virtualized environments is even more primitive. That doesn’t mean we shouldn’t talk about it- but we should be frank about where we are.
Talking about millicores makes it sound as if everything is much more scientific and well-understood than is the case. One challenge is that 3000 millicores isn’t always equal to 3000 millicores. I can think of two applications that I recently worked with that had resource requirements of approximately 3000 millicores — yet their CPU usage was very different.
The first was a C++ app that was coded to be numa aware. It started three threads that would pinned to cores on cpu (socket) 0.
The second was a java application that started over 800 threads spread across cores 0 through 15, including the JVM’s GC threads and the JVM’s compiler threads.
Application #1 would keep three cores close to 100% busy.
Application #2 would keep 16 cores about 20% busy
Both applications need about 3000 millicores to perform well, but app1 would only perform acceptably if the three threads ran on cores on socket 0 — because the appropriate NIC was connected to the PCI bus attached to socket 0. App2 performs best if the 3000 millicores are distributed across as many cores as is possible, given that the threads take action when any of 500 TCP connections receive a message and the app (unfortunately) has a thread-per-connection model.
The next complication that arises is that of typical workloads. The notions of idle, normal and spike are a nice start — but they are only a start, and they are inputs — but the real value proposition is responsiveness or latency — is an application meeting it’s nonfunctional requirements? I’ve worked with a shitload (technical term) of slow applications, and I’ve written a shitload of slow code too.
Virtualization has been good to me. I love virtualization in all its forms with a passion. The “aha” moment for me was sitting in a meeting in a nasty politicized meeting with a bunch of tech managers in a dysfunctional company. I was presenting a deployment diagram illustrating how the UAT and QA environments for a new application X would work. And I suddenly saw that we hadn’t built a box to act as the QA2 host to run component X. But we had built QA1 — and it was a Xen VM — so I texted my colleague Ryan, “Hey can you please clone QA1 for X and name it QA2 and tell me it’s IP address?”
Mr Nasty app Manager speaks, “Hey Peter, this diagram is all well an dgood, but havent you forgotten something?”
“What’s that Mike?”
“There’s no QA2 instance for component X?”
“Really? Oh my goodness! Oh my goodness! That was omitted from the picture.”
“I hope that you have enough hosts to build our QA environment”
“Of course Mike, QA2 for component X has hostname xqa2. It’s IP address is blah.blah.blah.mofo. You can logon now and check it out …”
I loved Xen, and virtualization, from that moment on.
But virtualization has also helped pay my mortgage. If you take a typical application , deploy it to a Xen, KVM or Vmware environment you will typically see that median latencies are similar to what they were on a physical host, but that the 99%, 99.9%, 99.99%, 99.999% latencies will be much, much larger on a VM than on a physical host. For those applications that are not latency sensitive, VMs, and containers are awesome. For those applications where median or 99% latencies are < 100μs, when you deploy them to VMs you get appalling performance.
It’s possible, even likely that the performance characteristics of Docker are much, much better than vmware, Xen, kvm etc — but I’m skeptical that they are comparable to physical servers. Hav eyou measured this and done comparusons as part of your capacity planning / sizing work?