Starschema Blog
Published in

Starschema Blog

Deploying TabPy in Enterprise: Scaling and Hardening in Kubernetes

There are so many tutorials out in the wild about how to take an application, containerize it and run it in your enterprise’s on-prem/public cloud securely, but hey, this will be yet another one.

Image from pixabay

During my daytime work, we help our clients to use the power of Python calculations in a data visualization tool called Tableau, which requires a small middleware that actually evaluates the python code — outside the standard platform. This middleware is called TabPy and this story is about how to deploy it in Kubernetes with the minimum configurations for scaling and hardening.

To give you an overview, this is what you will learn:

  • How to build a Dockerfile for a Python-based application using Alpine Linux
  • Basic docker hardening tips
  • Deploy containers in Kubernetes using Services and Deployments in AWS EKS (but should work in other cloud providers)
  • Setup pod network security with Calico
  • Setup autoscaling with metrics-server

I have a few assumptions to start like you have access to some Kubernetes based cloud service (Amazon EKS, GKS or on-premise cluster). If this is not the case, I would suggest to check out AWS EKS’s getting started guide first. Also, you will need basic Unix scripting and docker experience too.

All source code from this post are on gihub: starschema/k8s-tabpy

Great, let’s get started.

Building Dockerfile for TabPy

First things first, we need a Dockerfile that runs our TabPy server with all the potential python modules we might leverage from our application. These python packages need some pre initialization as well as the TabPy service itself.

The outline of the Dockerfile will look like something this:

FROM alpine:3.11MAINTAINER Tamas Foldi <>COPY tabpy.conf requirements.txt ./ENV PACKAGES="\
< things we actually need>
< packages required only to build the python packages>
RUN apk add --no-cache $PACKAGES \
&& apk add --no-cache --virtual build-deps $BUILD_PACKAGES \
&& rm -rf /var/cache/apk/* \
&& adduser -h /tabpy -D -u 1000 tabpy \
&& pip3 install --upgrade pip \
&& pip3 install --no-cache-dir -r requirements.txt \
&& su tabpy -c "python3 -m textblob.download_corpora lite && python3 -m nltk.downloader vader_lexicon" \
&& su -c "tabpy --config ./tabpy.conf & (sleep 1 && tabpy-deploy-models) && killall tabpy" \
&& apk del build-deps
USER 1000:1000
CMD [ "tabpy", "--config=./tabpy.conf" ]

See what we do and why. We start from Alpine Linux due to various reasons:

  • Alpine uses musl , libressl and busybox as base. It makes it lightweight and secure, compared to mainstream distros like RedHat/centos/ubuntu.
  • The base Docker image compressed size is 2.6MB (!!!!) which includes a full-featured package manager
  • All binaries are compiled as Position Independent Executables (PIE) with stack smashing protection. Again, more secure than others.

The sequence to build and configure TabPy is the following:

  1. Install packages that we want to add to the final image like python3 or openblas
  2. Install build dependencies, those packages we need only to build our python source packages. The command apk add --no-cache --virtual build-deps $BUILD_PACKAGES creates a virtual package definition for build-deps to uninstall these packages later in the docker build.
  3. Install the actual application and its dependencies using a requirements.txt file
  4. Create user tabpy to run the service inside the container. Remember, never run container apps as root.
  5. Initialize the packages. Execute everything that requires additional downloads (like NLTK sentiment database) as the running container will have no internet access.
  6. Uninstall all build dependencies (our virtual package) we installed in step #2
  7. Define the UID we want to use to run container

What is missing here is the SSL configuration and user authentication. SSL can be configured on TabPy level along with basic authentication. However, in our case, we will configure SSL on the edge. In case you required to encrypt all in-cluster communication, it is preferable to configure SSL both in the container and on the edge too.

Now take a closer look at security.

Hardening a container

In the previous step, we made a few steps to ensure a secure environment for our application:

  • Use secure, PIE compiled OS (like Alpine)
  • Avoid OpenSSL for security
  • Create a system user to run the service inside the container
  • Explicitly define UID
  • Prepare a container for network lockdown: download every runtime resources in advance
  • Remove all build dependencies. No headers, compilers or development libs should be in the deployed container. Same for bash, we just don’t need a full power shell in our container.
  • Remove writable folders from the containers, if possible. (TabPy needs a writable directory)

In addition to these steps, it is a best practice to set up SELinux domains for our application, similar to httpd_t. This could give additional security, enforcing rules like “no execution of child processes” or “cannot read files from specific folders” — even if the classic Unix permissions allow it. SELinux is out of the scope of this post, but I highly encourage you to use it.

Deploy containers to Kubernetes

In minimal setup, we need a Service and a Deployment resources to start our pods (pod is the combination of container, storage, IP, etc.). The Service is responsible to make our service discoverable internally and externally, while Deployment just explains what containers we need in what setup.

To add the Service, just type:

kubectl apply -f

For Deployment:

kubectl apply -f

Make sure you have the following sections in your Deployment file:

runAsUser: 1000 # make sure we execute things as non-root
allowPrivilegeEscalation: false # do not allow suid
privileged: false # no privileged containers
hostNetwork: false # deny accessing the host's network

Now we should see something like:

$ kubectl get pods,services -n tabpy
pod/tabpy-deployment-58d6f864f9-45j2m 1/1 Running 0 14h
pod/tabpy-deployment-58d6f864f9-cplvd 1/1 Running 0 14h
service/tabpy LoadBalancer hostname 9004:30474/TCP 2d

Now we can test our service:

$ curl http://hostname:9004/info   {"description": "", "creation_time": "0", "state_path": "/tabpy", "server_version": "1.0.0", "name": "TabPy Server", "versions": {"v1": {"features": {}}}}%

All looks good.

Network Security for Pods

In most of the cases, we do not need any outgoing networking connection from our pods other than intra-namespace connections. In our specific case, we do not need any outgoing connection at all.

To deny network connections, we need the Calico network policy system to be installed on our kubernetes cluster. In case we don’t have, simply install it with:

kubectl apply -f

Our NetworkPolicy is fairly easy, we simply disable all Engress connection in the namespace where our pods deployed:

kind: NetworkPolicy
name: default-deny-egress
namespace: tabpy
matchLabels: {}
- Egress

Now, there are no outgoing connections from pods — not even DNS requests:

$ kubectl exec -n tabpy -ti  tabpy-deployment-58d6f864f9-45j2m sh 
/ $ nc -v 80
nc: bad address ''

We are done with basic security/hardening, we can proceed with the scaling.

Setup Autoscaling with metrics-server

What is autoscaling and why do we need it? The idea here is to provide automatic horizontal scaling based on resource consumption. If the average CPU consumption of our containers goes up to 70% we might need to start new containers to handle the load. If the load goes down — we should downscale our services.

In order to scale up our services horizontally, we need to get CPU and memory information from our containers. While in the past heapster was enough, these days (kubernetes>1.11) we need metrics-server to be installed. If you do not have metrics-server, the easiest way to get it with curl and jq :

DOWNLOAD_URL=$(curl -Ls "" | jq -r .tarball_url)
DOWNLOAD_VERSION=$(grep -o '[^/v]*$' <<< $DOWNLOAD_URL)
curl -Ls $DOWNLOAD_URL -o metrics-server-$DOWNLOAD_VERSION.tar.gz
mkdir metrics-server-$DOWNLOAD_VERSION
tar -xzf metrics-server-$DOWNLOAD_VERSION.tar.gz --directory metrics-server-$DOWNLOAD_VERSION --strip-components 1
kubectl apply -f metrics-server-$DOWNLOAD_VERSION/deploy/1.8+/

If all looks good, we should see something like:

$ kubectl get deployment metrics-server -n kube-systemNAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server 1/1 1 1 9m1s

Now it’s time to define the horizontal scaling rules, like if the CPU usage is more than 70% then scale up to ten containers. I kept the minimum as two to have some basic high availability during worker node crash or rolling upgrades:

$ kubectl autoscale deployment tabpy-deployment --cpu-percent=70 --min=2 --max=10 
horizontalpodautoscaler.autoscaling/tabpy-deployment autoscaled

To check the results:

$ kubectl describe horizontalpodautoscalers.autoscaling/tabpy-deployment                                                                             
Name: tabpy-deployment
Namespace: tabpy-dev
Labels: <none>
Annotations: <none>
CreationTimestamp: Sun, 23 Feb 2020 13:10:11 -0500
Reference: Deployment/tabpy-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 1% (1m) / 70%
Min replicas: 2
Max replicas: 10
Deployment pods: 2 current / 2 desired
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range

This looks good, we see the current usage (1%) and the threshold (70%) to scale. In case the load will reach 70%, autoscaler will increase the number of pods to reduce the load. When the load decreases, autoscaler reducing the number of pods to the desired state.

Unknown CPU/No metrics known for pod

In case you see unknown CPU usage and your metrics-server emits a no metrics known for poderror just make sure you have resources/requests defined in your Deployment definition:

memory: "64Mi"
cpu: "100m"

Now your horizontal scaling rules are in place, security is around acceptable, it seems you are ready to invite your users.


Kubernetes could be easy or complex, depending on the depth you are using it. However, to deploy our TabPy service things were fairly easy. However, if you had any issues just drop a message, I’ll try to sort it out.




Data contains intelligence that can change the world — we help people discover, manage and use this intelligence.

Recommended from Medium

Good News: More Power for Google BigQuery Analytics

Kotlin’s development pot holes found in 3 months

@Transactional annotation in Spring

Building A Simple Job-Position Site In GoBuffalo part 1


5 Strategic Tips For Servers To Attack Returners

Vitek Vms For Mac Download

How to Install Anaconda in Linux Mint!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tamas Foldi

Tamas Foldi

Tamas is co-founder and CTO of data services firm Starschema where he leads the Starschema technical team to deliver results for the most innovative enterprises

More from Medium

A Complete Overview of Apache Kafka

Easy Stream Management with Kafka Connect

How to access private Git repositories during a Docker image build

Xmigrate beta_v0.3.0 released