Deploying TabPy in Enterprise: Scaling and Hardening in Kubernetes

Published in

HCLTech-Starschema Blog

7 min readFeb 24, 2020

There are so many tutorials out in the wild about how to take an application, containerize it and run it in your enterprise’s on-prem/public cloud securely, but hey, this will be yet another one.

During my daytime work, we help our clients to use the power of Python calculations in a data visualization tool called Tableau, which requires a small middleware that actually evaluates the python code — outside the standard platform. This middleware is called TabPy and this story is about how to deploy it in Kubernetes with the minimum configurations for scaling and hardening.

To give you an overview, this is what you will learn:

How to build a Dockerfile for a Python-based application using Alpine Linux
Basic docker hardening tips
Deploy containers in Kubernetes using Services and Deployments in AWS EKS (but should work in other cloud providers)
Setup pod network security with Calico
Setup autoscaling with metrics-server

I have a few assumptions to start like you have access to some Kubernetes based cloud service (Amazon EKS, GKS or on-premise cluster). If this is not the case, I would suggest to check out AWS EKS’s getting started guide first. Also, you will need basic Unix scripting and docker experience too.

All source code from this post are on gihub: starschema/k8s-tabpy

Great, let’s get started.

Building Dockerfile for TabPy

First things first, we need a Dockerfile that runs our TabPy server with all the potential python modules we might leverage from our application. These python packages need some pre initialization as well as the TabPy service itself.

The outline of the Dockerfile will look like something this:

FROM alpine:3.11MAINTAINER Tamas Foldi <tfoldi@starschema.net>COPY tabpy.conf requirements.txt ./ENV PACKAGES="\
< things we actually need>
"ENV BUILD_PACKAGES="\
< packages required only to build the python packages>
"RUN apk add --no-cache $PACKAGES \
  && apk add --no-cache --virtual build-deps $BUILD_PACKAGES \
  && rm -rf /var/cache/apk/* \
  && adduser -h /tabpy -D -u 1000  tabpy \
  && pip3 install --upgrade pip \
  && pip3 install --no-cache-dir -r requirements.txt  \
  && su tabpy -c "python3 -m textblob.download_corpora lite && python3 -m nltk.downloader vader_lexicon" \
  && su -c "tabpy --config ./tabpy.conf & (sleep 1 && tabpy-deploy-models) && killall tabpy" \
  && apk del build-depsUSER 1000:1000
EXPOSE 9004CMD [ "tabpy", "--config=./tabpy.conf" ]

https://github.com/starschema/k8s-tabpy/blob/master/Dockerfile

See what we do and why. We start from Alpine Linux due to various reasons:

Alpine uses musl , libressl and busybox as base. It makes it lightweight and secure, compared to mainstream distros like RedHat/centos/ubuntu.
The base Docker image compressed size is 2.6MB (!!!!) which includes a full-featured package manager
All binaries are compiled as Position Independent Executables (PIE) with stack smashing protection. Again, more secure than others.

The sequence to build and configure TabPy is the following:

Install packages that we want to add to the final image like python3 or openblas
Install build dependencies, those packages we need only to build our python source packages. The command apk add --no-cache --virtual build-deps $BUILD_PACKAGES creates a virtual package definition for build-deps to uninstall these packages later in the docker build.
Install the actual application and its dependencies using a requirements.txt file
Create user tabpy to run the service inside the container. Remember, never run container apps as root.
Initialize the packages. Execute everything that requires additional downloads (like NLTK sentiment database) as the running container will have no internet access.
Uninstall all build dependencies (our virtual package) we installed in step #2
Define the UID we want to use to run container

What is missing here is the SSL configuration and user authentication. SSL can be configured on TabPy level along with basic authentication. However, in our case, we will configure SSL on the edge. In case you required to encrypt all in-cluster communication, it is preferable to configure SSL both in the container and on the edge too.

Now take a closer look at security.

Hardening a container

In the previous step, we made a few steps to ensure a secure environment for our application:

Use secure, PIE compiled OS (like Alpine)
Avoid OpenSSL for security
Create a system user to run the service inside the container
Explicitly define UID
Prepare a container for network lockdown: download every runtime resources in advance
Remove all build dependencies. No headers, compilers or development libs should be in the deployed container. Same for bash, we just don’t need a full power shell in our container.
Remove writable folders from the containers, if possible. (TabPy needs a writable directory)

In addition to these steps, it is a best practice to set up SELinux domains for our application, similar to httpd_t. This could give additional security, enforcing rules like “no execution of child processes” or “cannot read files from specific folders” — even if the classic Unix permissions allow it. SELinux is out of the scope of this post, but I highly encourage you to use it.

Deploy containers to Kubernetes

In minimal setup, we need a Service and a Deployment resources to start our pods (pod is the combination of container, storage, IP, etc.). The Service is responsible to make our service discoverable internally and externally, while Deployment just explains what containers we need in what setup.

To add the Service, just type:

kubectl apply -f https://raw.githubusercontent.com/starschema/k8s-tabpy/master/tabpy-svc.yaml

For Deployment:

kubectl apply -f https://raw.githubusercontent.com/starschema/k8s-tabpy/master/tabpy-pod.yaml

Make sure you have the following sections in your Deployment file:

        securityContext:
          runAsUser: 1000  # make sure we execute things as non-root
          allowPrivilegeEscalation: false  # do not allow suid
          privileged: false   # no privileged containers
          hostNetwork: false  # deny accessing the host's network

Now we should see something like:

$ kubectl get pods,services -n tabpy
NAME                                    READY   STATUS    RESTARTS   AGE
pod/tabpy-deployment-58d6f864f9-45j2m   1/1     Running   0          14h
pod/tabpy-deployment-58d6f864f9-cplvd   1/1     Running   0          14hNAME            TYPE           CLUSTER-IP     EXTERNAL-IP                                                                  PORT(S)          AGE
service/tabpy   LoadBalancer   10.100.55.81   hostname   9004:30474/TCP   2d

Now we can test our service:

$ curl http://hostname:9004/info   {"description": "", "creation_time": "0", "state_path": "/tabpy", "server_version": "1.0.0", "name": "TabPy Server", "versions": {"v1": {"features": {}}}}%

All looks good.

Network Security for Pods

In most of the cases, we do not need any outgoing networking connection from our pods other than intra-namespace connections. In our specific case, we do not need any outgoing connection at all.

To deny network connections, we need the Calico network policy system to be installed on our kubernetes cluster. In case we don’t have, simply install it with:

kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.5/config/v1.5/calico.yaml

Our NetworkPolicy is fairly easy, we simply disable all Engress connection in the namespace where our pods deployed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
  namespace: tabpy
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Egress

Now, there are no outgoing connections from pods — not even DNS requests:

$ kubectl exec -n tabpy -ti  tabpy-deployment-58d6f864f9-45j2m sh 
/ $ nc -v google.com 80
nc: bad address 'google.com'

We are done with basic security/hardening, we can proceed with the scaling.

Setup Autoscaling with metrics-server

What is autoscaling and why do we need it? The idea here is to provide automatic horizontal scaling based on resource consumption. If the average CPU consumption of our containers goes up to 70% we might need to start new containers to handle the load. If the load goes down — we should downscale our services.

In order to scale up our services horizontally, we need to get CPU and memory information from our containers. While in the past heapster was enough, these days (kubernetes>1.11) we need metrics-server to be installed. If you do not have metrics-server, the easiest way to get it with curl and jq :

DOWNLOAD_URL=$(curl -Ls "https://api.github.com/repos/kubernetes-sigs/metrics-server/releases/latest" | jq -r .tarball_url)
DOWNLOAD_VERSION=$(grep -o '[^/v]*$' <<< $DOWNLOAD_URL)
curl -Ls $DOWNLOAD_URL -o metrics-server-$DOWNLOAD_VERSION.tar.gz
mkdir metrics-server-$DOWNLOAD_VERSION
tar -xzf metrics-server-$DOWNLOAD_VERSION.tar.gz --directory metrics-server-$DOWNLOAD_VERSION --strip-components 1
kubectl apply -f metrics-server-$DOWNLOAD_VERSION/deploy/1.8+/

If all looks good, we should see something like:

$ kubectl get deployment metrics-server -n kube-systemNAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1            1           9m1s

Now it’s time to define the horizontal scaling rules, like if the CPU usage is more than 70% then scale up to ten containers. I kept the minimum as two to have some basic high availability during worker node crash or rolling upgrades:

$ kubectl autoscale deployment tabpy-deployment --cpu-percent=70 --min=2 --max=10 
horizontalpodautoscaler.autoscaling/tabpy-deployment autoscaled

To check the results:

$ kubectl describe horizontalpodautoscalers.autoscaling/tabpy-deployment                                                                             
Name:                                tabpy-deployment
Namespace:                           tabpy-dev
Labels:                              <none>
Annotations:                         <none>
CreationTimestamp:                   Sun, 23 Feb 2020 13:10:11 -0500
Reference:                                             Deployment/tabpy-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  1% (1m) / 70%
Min replicas:                                          2
Max replicas:                                          10
Deployment pods:                                       2 current / 2 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range

This looks good, we see the current usage (1%) and the threshold (70%) to scale. In case the load will reach 70%, autoscaler will increase the number of pods to reduce the load. When the load decreases, autoscaler reducing the number of pods to the desired state.

Unknown CPU/No metrics known for pod

In case you see unknown CPU usage and your metrics-server emits a no metrics known for poderror just make sure you have resources/requests defined in your Deployment definition:

resources:
  requests:
    memory: "64Mi"
    cpu: "100m"

Now your horizontal scaling rules are in place, security is around acceptable, it seems you are ready to invite your users.

Conclusion

Kubernetes could be easy or complex, depending on the depth you are using it. However, to deploy our TabPy service things were fairly easy. However, if you had any issues just drop a message, I’ll try to sort it out.