Docker Hosting: 6 Things to Watch Out For
What I saw left me somewhat scarred, a lot more knowledgeable, and acutely aware of the shortcomings of many of the tools that were becoming popular.
Unless you’ve been living under a rock for the last few years (if you have been… I am very sorry), you have likely heard about containerization and Docker. If you’re like a lot of the people who work on the guts of the internet, you’re probably evaluating your options for running Docker in production and figuring out how to deploy it for your business or project. Trying to keep up with the rapid pace of change can be difficult even for the most focused individual. I was in the same boat about a year ago, and ended up taking a whirlwind hands-on tour of the eco-system as it was being born. What I saw left me somewhat scarred, a lot more knowledgeable, and acutely aware of the shortcomings of many of the tools that were becoming popular. By the time I had a decently functional hosting stack running, my team had developed many custom pieces of glue code to fit everything together, and I was so unhappy with the design and operation of the systems that I quit my job and co-founded ContainerShip.
In broad terms, without naming names, I’m going to point out some of the issues I encountered along the way that made me feel like I had to quit my job to make things better. Obviously I am biased, I think ContainerShip is amazing, you should use it and save yourself a lot of time and pain. I won’t force you to try it, but if I could I would because it rules. I will try not to let my bias drive this post too much though!
My list of 6 things to look out for
Problem 1: A Million Moving (Missing) Parts
Micro-service & service oriented architecture design patterns advocate creating a software architecture made up of many distinct, small and loosely coupled pieces as opposed to a single monolith. When it comes to developing a large software project it is easier for multiple teams to own their own service and develop it in the language that makes the most sense for that project. This mindset has permeated the infrastructure world as well. In theory it makes sense, use the parts that work well for you and swap out things that don’t with something better. Like I said, this works well in theory, but in practice it can lead you down a rabbit hole.
When your hosting platform is constructed of a ton of moving parts that are all maintained by a different open source project or company, it requires writing a lot of glue code to plug everything together. In doing so you end up creating a lot of parts on your own that you are responsible for maintaining, and it can be unclear during an issue which component is at fault. You can’t be too sure that there will be someone to turn to for support when you need it most. On top of that, training team members on a slew of different components is time consuming and difficult, and as APIs change and breaking changes are released it’s up to you alone to make sure it all continues working.
I went down this path initially and with a team built infrastructure powering hundreds of very busy services. It ended up being a huge pain in the ass to maintain.
Problem 2: No High Availability
There are a few open source projects gaining popularity right now that have completely ignored the need for high availability in their “master” servers, the machines that orchestrate the rest of the cluster. The scary thing is these companies tout their product as the best way to run Docker in production. If the cluster management system is not HA, doesn’t have any concept of leader election, and runs on a single server, I can’t imagine having the balls to call it production grade. Some people apparently have no problem with that though. I wouldn’t want to be the guy on pagerduty when that server bites the dust. Whatever solution you choose, please ensure that it supports multiple masters for high availability and peace of mind. Please also make sure that the system is able to decide which master should be leader via some election mechanism or things have a pretty good chance of ending in tears.
Problem 3: Not Open Source
I don’t know about you but I have hard time with the idea of putting the fate of my software deployments in the hands of a project that is not open source. If you have no idea what is going on under hood I find it rather difficult to trust. Take the Genius scandal for example. Nobody thought to consider that some documentation could be fibbing about the underlying operation of the hosting platform they were running on. Meanwhile their end users were getting hosed with crappy response times. How many wacky things are going on in the hosted non-open source docker management system you’re running on? That’s what I thought, nobody has any idea.
Problem 4: Unnecessary Network Requirements
Lately there is a trend of using an overlay network to give every container on a host system it’s own unique IP address. The main driver for running things in this type of setup is that of ease of use, but it comes at the cost of latency and bandwidth limitations. Even some of the most high profile container orchestration systems are forcing this on users. Things have improved somewhat in newer implementations, but it still seems like the idea of decreasing performance to increase ease of use is not a good one. The alternative to using an overlay network is dealing with “port map hell” i.e. running all containers on random ports and figuring out how to get your traffic to the right place. The good news is that solving the port mapping problem is really not that difficult and you don’t need to limit your performance. There is a reason that systems such as Mesos and ContainerShip don’t use an overlay network by default. You’re trying to improve performance, availability, and functionality by moving to a distributed architecture, don’t set yourself up for a bad time, and bad performance later.
Problem 5: Someone Else Hosts The Important Stuff
Everyone is trying to get in on the container action. Almost every one of the large cloud providers has released their own solution for running containers like Docker in production. AWS has EC2 Container Service, Google has Google Container Engine, Joyent has Triton, etc. Unfortunately (in my opinion), running your containerized workload on the product your hosting provider offers you is going against one of the most exciting benefits of using containers in the first place: portability. This makes sense though, hosting providers do whatever they can to keep you on board and using their services instead of their competition. That makes total sense and in the past you didn’t really have much of a choice without doing some serious wrestling with configuration management systems and multiple provider APIs. These days the tables have turned and there are options that can keep you open and provider agnostic. I would greatly advise against using a system that doesn’t have flexibility and being provider agnostic as one of it’s main goals.
On the other end of the spectrum are the “Docker as a Service” providers that run all of the important systems for you on their own network and simply launch standalone servers in your hosting provider for you that have an agent that connects back to their system for management. You need to consider what you are going to do if you no longer want to pay one of these providers or if you begin to outgrow them. There is no way to start out free and open source with these providers, and there is no way to return to free and open source and stop paying them should you need to. You guessed it though, ContainerShip lets you do just that. How? When you launch a cluster on our service, the brain of the system runs on your own servers so you can disconnect the cluster from our Cloud service at any time and run on your own using the open source core system. Pretty sweet.
Problem 6: Forcing An Operating System On You
As someone who was responsible for security and PCI DSS compliance in a previous job, and the hundreds of monitoring and audit requirements that went along with it, I literally couldn’t use a micro Linux operating system because the IDS/logging/security software that you have to prove to the auditors is functioning properly just doesn’t fit well in a world where Docker is the only way to install and run something on the host. Maybe that isn’t a problem for you, but I like to have a choice in the operating system I’m using under the hood. And why should you need to use a specific operating system just to use Docker? Or better yet why should you need to utilize an init system to launch containers? I think that is the kind of tight coupling that you want to avoid, it could bite you down the road. Being flexible and supporting many flavors of Linux is especially important for businesses that want to be able to continue using OS’s that they have already invested time training on, or have other deployments using.
Conclusion
These are just my opinions but they represent countless hours of research, development, and actually running Docker in production at scale. Keep these things in mind as you plot out your move to containers and distributed systems. Sometimes the hype can take you down a path that won’t work very well 6 months from now. This area of computing is changing rapidly, but these changes don’t erase years of best practices. Make sure to trust your gut, and if something seems like it would be a bad idea outside of the container world, containers aren’t going to make it a good idea.