Docker, just another buzzword?

Jon Svendsen
The Tele2 Technology Blog
8 min readApr 4, 2019

by Jon Svendsen, Senior Sysadmin

For little over a year ago I was trusted with the task to look into what’s up with all that Docker buzzing and if we should offer an on-premise ‘Container as a Service’ as a way to take control over that container spree in our server park. As most people start digging into this subject I was just as enchanted by its magic as I was troubled by the fact that operations was not that big of a concern for anyone. This article is an attempt to summarize what I learned so far…

The buzz

Docker allows you to create independent and isolated environments. Also referred to as containers. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application. So this is the new way everyone like to publish their stuff. All from software suppliers to in-house developers have embraced this ‘new’ way to package their thing for distribution. It have proven itself to be highly effective for anyone who want’s to speed up the whole ‘time to market’ process. There are also other big advantages but in it’s essence this is what the buzz is really about. From my perspective this also means that Operation lose control over things they traditionally maintain or are responsible for.

Operations needs to be onboard to make a transition to container workload a success story

How do one begin?

After some initial investigation around where to even begin I decided to put in place a list of desired features, from an operations perspective. Just to have some guidance trough the never ending maze of buzzword’s. In this phase I also realized that there is no actual magic going on in the world of containers and my features should probably be requirements.

The list

  1. Roll Based Access Control
  2. Network access management
  3. Security/Risk management
  4. Resource management
  5. Metrics
  6. Logs
  7. Support

Pretty basic stuff. The requirement Support implies that one is looking for a product, supplier, partner or vendor. And yes, that was exactly what we aimed for. When you plan to migrate critical business workload into a totally new infrastructure it feels pretty important to have some kind of support to back you up… At least until you know what your doing.

One platform to rule them all

To keep it short we ended up choosing Docker Enterprise. Who’s better than the one’s creating all the hype?

Docker Enterprise orchestrate both Swarm and Kubernetes workload. It cover all the basics requirements with a explicit focus on the fact that Operations needs to be onboard to make a transition to container workload a success story. Kubernetes by it self covered most of those ‘requirements’ in my list but with DockerEE on top of that we got it all, and some.

Some deal breakers:

  • Namespaces
    One really nifty thing with Kubernetes is the concept of Namespaces. Namespaces gives you the power to create a virtual cluster within a cluster. That again is very convenient if you are going to share a cluster with other users.
  • Role Based Access Control
    DockerEE RBAC in combination with Kubernetes own RBAC model gives you all the power & flexibility you need to control user access to resources.
  • Network policy’s
    Kubernetes network policy’s controls all the network access. This is the firewall in clear text files so that all users have full insight into their restrictions. As Cluster admin you have total control over all traffic going in and out of every namespace.
  • Resource quota
    Sets a resource quota on namespaces so you can declare maximum resource utilization allowed in a namespace.
  • Private registry
    A registry is an application that manage, store and deliver container images. Included in the product, hence it uses the same RBAC model as rest of the cluster. Enhanced with the possibility to scan and control what images that are allowed to run in our cluster.
  • Support
    If ‘the sky suddenly falls down’. You also get access to the right competence when require assistance setting up or configure your cluster.

Virtual vs. Physical

This was a hot topic. Everyone mostly arguing that one must run containers on physical hosts. It seems to come down to that you can squeeze out that extra performance. Fair enough, I will not argue with that. But since we did not have any real knowledge about how to size resources or what kind of hardware we need, it made sense to just spin up a bunch of virtual machines so that we later can make a decision based on real data & experience rather then guesses.

Unboxing DockerEE

Before you throw yourself all over the installation procedures it’s a good idea to educate yourself on how the cluster is going to be accessed and access external resources. This is an important input variable for how you architect your cluster environment. We ended up installing two clusters. This way we did a distinct separation of production and non production workload. One advantage doing this ways is that we can upgrade non-production and observe stability before doing anything in production. Separation also made it much easier for us to automate everything from deployments to policy’s. To expose our services we decided to use an Ingress Controller where services Ingress are configured with name-based virtual hosting. If you then assign a wildcard DNS record to your loadbalancer you take away the reason for anyone interacting with DNS or cluster admins before consuming services.

DockerEE Cluster

Installing DockerEE is a no-brainer, or at least if you’re a sysadmin. In short it’s a bunch of servers & containers that needs to be running and you’re good to go. The real problem is to understand how you and your users are going to manage & consume it. In fact, my conclusion is that it’s not so much about putting in place a new technical platform as it is for you and your users to start collaborating and practising new ways of work. Our current strategy around this is to not over engineer anything and listen to feedback from users to guide us on what we need to do.

Automation & Everything As Code

Since developers already exercise the concepts of CI/CD our idea was that we also start using that same established and standardized workflow to initiate, update and change configuration in our clusters.

Get a namespace

User input:
Name of the namespace
— Email to the one requesting the namespace
— Email for feedback of cluster events, such as ‘image scan reports’

Onboard workflow

  1. Validate that we have enough cpu & memory resources in the cluster.
  2. Create a new LDAP group representing that namespace and adding the requester as member in this group
  3. Create namespace, adding RBAC rules and granting access to any group members.
  4. Add resource quota, default network policy’s etc. to the namespace
  5. Notify the requester that it’s all done.
Onboard pipeline

So, we have a CI/CD pipeline creating and granting user access to a namespace, add resource quota to ensure that namespace don’t utilize more resources than allowed, add network access restrictions except centralized services and when all done notifies the requester it’s time to get down and boogie!

… and then?

Users just needs to update their Git repository with a Dockerfile, Kubernetes deployment files and finally initiate a new Jenkins pipeline that puts their services into the cluster. The only real challenge here, unless your an Kubernetes Ninja, is to describe your service in those Kubernetes yaml files.

No limits

Inside your namespace there are no limitations on what you can do, except that users them self don’t have permissions to change resource quota or update network policy’s. The focus is to enable developers do whatever they need to do without constant interaction with Operations. Freedom but also making sure that Operations still honor their mission.

Some important additional features

Add/change network policy’s

Users make a merge request in a cluster dedicated Git repository. These requests are automatically validated against our network policy’s and if approved merged directly. If requests can’t be validated these are sent to cluster admins before any action taken.

Change resource quota

Users make a merge request in a cluster dedicated Git repository. These requests gets reviewed & approved/declined by cluster admins. This workflow is targeted for automation but method and input variables are still under investigation.

Logs

Anything that goes to standard out in a container gets collected and shipped to a centralized logging service.

Metrics

All users have access to view resource utilization for containers & namespaces.

Grafana Dashboard

apropos status

Our developers just started their journey to containerization and Operations are all still learning how to manage and operate it all. There are more automation to put in place and many features and enhancements to investigate… but still… we are there, in production, and it’s just amazing!

shutdown -r now

Before I shutdown the business I would like to point out that this was a group effort, collaboration between silos that forced us to establish new ways of work. Without the awesome crew of Wen Zhou & Michael Stewart as the CI/CD pros and Håkan Björklund as the skilled Network ninja by my side this project would for sure been a total failure.

NotImplementedError

Some of you are now probably more or less disappointed that this was not much of an technical story and wondering where all the cool syntax examples are. You literally find thousands of these all over the internet. The message in this article is that we all need to continuously change and adapt to keep up with the increasing pace of cool stuff thrown our way… or we just fail.

So… is Docker just another buzzword?

I have to say no, it’s so much more for both Dev and Ops and it do for sure enable us all to start practicing a whole bunch other buzzword out there…

If anyone is interested in a more technical deep dive regarding anything above, please reach out and I will do my best to make that happened!

— j0nix

--

--