How to use Glow in Kubernetes

8 min readJan 6, 2016

Introduction

I have been playing with golang for some time, and I was wondering if there were any MapReduce framework available for that language.
That is how I found Glow !

According to the description available on the website:

”Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Samza, etc. “

That sounds promising!

After further research, I found out that many articles were already mentioning this framework. One of them is particularly complete, and it is available on the Gopher Academy: “Glow: Map reduce for golang”.

That’s decided, I am going to use Glow.

Because Glow is meant to run in a cluster mode, having an orchestrator would most probably be very valuable.
That’s why I decided to use my favourite orchestrator, Kubernetes.io to deploy my Glow cluster. Kubernetes.io is an open source container cluster manager from google and I highly encourage anyone to take a look at it.

I already had the chance to play with kubernetes and I can say it’s a very powerful and versatile tool. You can easily test it thanks to Google container engine. GCE offers the possiblity to start a Kubernetes cluster in one click.

So to sum up, I will try to explain in this article how to deploy a Glow cluster with:

one Glow master
several Glow agents in order to dispatch the processing charge on several machines. Each machine is represented by the kubernetes nodes.

Glow architecture

When you want to run Glow in cluster mode you need to start several process:

- one Glow Master: responsible for receiving and dispatching the charge on the several glow Agents.
- several Glow Agents: they will be in charge of processing full or partial requests.
- Glow Job: what I call a “Glow job” is your process that implements a glow flow. (In my example I will use the dummy example: word counter).

Kubernetes Elements

Now that we have seen the Glow architecture, let’s see how we can wrap them into kubernetes elements.

First let’s recap what are the principal K8s (Kubernetes) elements:

- Pod (doc): A pod corresponds to a colocated group of applications running with a shared context. In terms of Docker constructs, a pod consists of a colocated group of Docker containers with shared volumes.
- ReplicationController (doc): A replication controller ensures that a specified number of pod “replicas” are running at any one time. If there are too many, it will kill some. If there are too few, it will start more.
- Services (doc): A Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them.
- Job (doc): A job creates one or more pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the job tracks the successful completions.

For more information on kubernetes, I encourage you to read the documentation and watch this video.

Glow wrapping in kubernetes elements

Dockerized glow binary

Glow is written in Go, thanks to that it is easy to dockerize.
Why? Because Go allows you to compile a statically linked binary, that you can directly embed into a docker.

The following image will be used to bring up glow nodes (master and agents).

dockerfile of clamoriniere/glow-node image:

From the golang based image provided by Google, the glow github repository is downloaded, then compiled.

The container needs to expose two ports: 8930 (Master), 8931 (Agent).

Finally, the entry point with the “glow” binary.
The extra execution parameters will be provided at container run time.

Now that we are done for our dockerized glow-nodes, let’s take care of the glow flow job.

Dockerized your glow flow job

Again, we will use a very simple dockerfile: it just needs to include the job binary.
As explained above, we will use a dummy example job binary (“word counter”) that I named “glowk8s”.

dockerfile that generates the image: clamoriniere/glow-job

lets wrap

We start by wrapping the central glow element: the master.

Glow master ReplicationController
Constraint: There should be at least one glow master instance, up and running, at all times.

So what we need is a Pod managed by a ReplicationController.
This replicationController will manage the availability of this Master (named in the following configuration file: “glow-master”).

I think some explanations can be done on this configuration file :)

- First, the Pod definition. It is located in the “template” part.The pod is composed of only one container named “glow-node”.
It uses the Docker image that we have previously generated. We also need to set some arguments in order to declare that this glow-node will be a master.

We force the master to listen on every pod IPs thanks to the glow option: — ip=”0.0.0.0".

The ports:ContainerPort settings is here to informs kubernetes which port on the docker image needs to be accessible, in our case the glow master needs only to be accessible on one port: 8930.

Last element present in this pod configuration is the “resources” part. In order to schedule and assign properly the different pods in the nodes, Kubernetes offers the possibility to define some resources limits. If your container doesn’t respect those limits, it will be killed in order to protect others pod resources.

- One other important part of the replication controller configuration are the labels definition.
Thanks to those labels it will be possible to target the pod present in this replication controller by linking them to a Kubernetes service.

Glow Master Service

The aim of a kubernetes service is to provide an easy way to discover and access a service provided by a Pod. In our case, we will expose a Glow master service accessible with the port 8930.

Both Glow-Job and Glow-agent are connecting with the same port to master.

As explained previously, the link between a Pod and a Service is done thanks to labels. A service can be associated to one or several pod(s) thanks to the labels published. In our case the label used is: glow-master: “true”.

Also, several types of services are available, here we use a clusterIP: “None” service for some practical reason that I will explain later. One advantage of this type of service is the possibility to connect directly to a pod without using a service proxy.

Thanks to this “glow-master” service, it is now possible to have the pod IPs list that implements this service. Glow-agent and the Glow-job will use a DNS resolution query on this service to access a glow-master.

Glow Agent ReplicationController

Penultimate element: the glow-agent replication controller. It is very similar to the glow-master: it uses the same docker image.

Only few changes is need to configure it as an Agent. lets see them:

container execution parameters:
- the first argument as to be change to “agent”.
- then we need to specify the “data” directory. for that we use an kubernetes empty directory as volume.
- last parameter is the IP:PORT pair where to access the glow-master. For this, we use an environment variable in order to compose it with service DNS name and the service master port.
Labels: Labels need to be different also from the master. We do not want to access a glow-agent when using the “glow-master service”.
Port: Port is also different, the agent is listing on a different port than a master.
replicas: one aim of glow cluster is to dispatch the work load on several instance of agents, that it is why we configure “replica: 3”. It means that the replication controller will ensure to have always 3 instances of glow-agent running (what I call glow-node in my configuration). later we can increase or decrease this number of replica thanks to the scaling command that kubernetes offers.

Before describing the last element, I want to came back on the choice that I did about the service type. I did this choice due to the implementation of the communication between the master and the agents.

When an agent initiates the communication with his master, it informs this master on which IP:PORT the agent service can be accessible. In fact, the agent only informs on which port it can be accessible, and the master will consider the agent IP used during this initial communication to be the correct IP. It works fine for point to point communication but if we use a proxy in the middle like it is the case with the kubernetes service proxy, the IP registered as the Agent IP is in fact the proxy IP…

With a “clusterIP: “None”” service, the poxy is not used, the link between the master and agent is direct.

In order to support both kubernetes services types, a small modification in the Glow implementation needs to be done: the agent IP needs also to be provided in the communication protocol use between the agent and the master. Maybe it will be the first PR that I will do on the glow project :).

Glow Word counter Job

Ok, now we have the glow cluster configuration: we are able to instanciate a glow-master and several glow-agents. the communication between them is also done.

The only missing piece is the element that will use this cluster: the Job…

In contrary to a replication controller, the aim of a Job is to execute several time the same job. This is perfect for what we want to achieve: count the number of word then stop the process.

Kubernetes Job is also based on a docker image (in fact everything is docker image in kubernetes). This Job is using the docker image that we previously defined with the “word count” example.

The arguments are limited:

- -glow: to start in driver mode
- -glow-leader=glow-master:8930: to provides the master access information.

Now we have all we need to run it and test it :).

Experimentation

We need a kubernetes cluster up and running. In my case I have one running locally on my computer thanks to the vagrant configuration proposed in the official kubernetes repository, but you can also use the Google Container Engine available on GCP.

When you have your kubernetes cluster we can start:

first we need to start the glow-master replication controller:

Checking the pods instanciate in the kube cluster, the replication controller have started one pod and his ID is “glow-master-fqm9u”.

If we check container logs present in this pod, we can also see that everything looks fine.

2. Start the glow-master service

3. Start now the agents (glow-node)

lets see in the log of one agent if everything is fine

Ok so communication between the agent and the master was properly done (no warning or error in the agent log).

4. Now, we can run our Job.

The job was executed successfully and it was run in the pod name: counter-0c0ij. let see the result of the job

logs speak for themselves :) the process counts 336 words.

Conclusion

I hope my explanation was clear enough. What I wanted to show is: how it was easy to put in place a glow cluster with kubernetes.

I think it was easy for several reasons:

Glow architecture: very simple but powerfull!
Golang simplify a lot the containerization of go binary with no dependency management needed.
Kubernetes: thanks to his elements (Pod, Service, ReplicationController…) with their clear roles and purpose, it simplifies the orchestration of your application.

Thanks to kubernetes it will be easier to manage your cluster:

No need to configure manually your agent in order to target the right master IP:PORT. Kubernetes services will handle that for you.
No need also to monitor your different processes: the replication-controller ensures that the right number of agent are up and running.
You will be able to easily scale your cluster in case your number of jobs or their complexity increase.

You can find all the material that I used to test in this github repository: github.com/cedriclam/glow-k8s.