Deploying TabPy in Enterprise: Scaling and Hardening in Kubernetes
There are so many tutorials out in the wild about how to take an application, containerize it and run it in your enterprise’s on-prem/public cloud securely, but hey, this will be yet another one.
During my daytime work, we help our clients to use the power of Python calculations in a data visualization tool called Tableau, which requires a small middleware that actually evaluates the python code — outside the standard platform. This middleware is called TabPy
and this story is about how to deploy it in Kubernetes with the minimum configurations for scaling and hardening.
To give you an overview, this is what you will learn:
- How to build a Dockerfile for a Python-based application using Alpine Linux
- Basic docker hardening tips
- Deploy containers in Kubernetes using Services and Deployments in AWS EKS (but should work in other cloud providers)
- Setup pod network security with Calico
- Setup autoscaling with
metrics-server
I have a few assumptions to start like you have access to some Kubernetes based cloud service (Amazon EKS, GKS or on-premise cluster). If this is not the case, I would suggest to check out AWS EKS’s getting started guide first. Also, you will need basic Unix scripting and docker experience too.
All source code from this post are on gihub: starschema/k8s-tabpy
Great, let’s get started.
Building Dockerfile for TabPy
First things first, we need a Dockerfile
that runs our TabPy
server with all the potential python modules we might leverage from our application. These python packages need some pre initialization as well as the TabPy service itself.
The outline of the Dockerfile
will look like something this:
FROM alpine:3.11MAINTAINER Tamas Foldi <tfoldi@starschema.net>COPY tabpy.conf requirements.txt ./ENV PACKAGES="\
< things we actually need>
"ENV BUILD_PACKAGES="\
< packages required only to build the python packages>
"RUN apk add --no-cache $PACKAGES \
&& apk add --no-cache --virtual build-deps $BUILD_PACKAGES \
&& rm -rf /var/cache/apk/* \
&& adduser -h /tabpy -D -u 1000 tabpy \
&& pip3 install --upgrade pip \
&& pip3 install --no-cache-dir -r requirements.txt \
&& su tabpy -c "python3 -m textblob.download_corpora lite && python3 -m nltk.downloader vader_lexicon" \
&& su -c "tabpy --config ./tabpy.conf & (sleep 1 && tabpy-deploy-models) && killall tabpy" \
&& apk del build-depsUSER 1000:1000
EXPOSE 9004CMD [ "tabpy", "--config=./tabpy.conf" ]
https://github.com/starschema/k8s-tabpy/blob/master/Dockerfile
See what we do and why. We start from Alpine Linux due to various reasons:
- Alpine uses
musl
,libressl
andbusybox
as base. It makes it lightweight and secure, compared to mainstream distros like RedHat/centos/ubuntu. - The base Docker image compressed size is 2.6MB (!!!!) which includes a full-featured package manager
- All binaries are compiled as Position Independent Executables (PIE) with stack smashing protection. Again, more secure than others.
The sequence to build and configure TabPy
is the following:
- Install packages that we want to add to the final image like
python3
oropenblas
- Install build dependencies, those packages we need only to build our python source packages. The command
apk add --no-cache --virtual build-deps $BUILD_PACKAGES
creates a virtual package definition forbuild-deps
to uninstall these packages later in the docker build. - Install the actual application and its dependencies using a
requirements.txt
file - Create user
tabpy
to run the service inside the container. Remember, never run container apps as root. - Initialize the packages. Execute everything that requires additional downloads (like NLTK sentiment database) as the running container will have no internet access.
- Uninstall all build dependencies (our virtual package) we installed in step #2
- Define the
UID
we want to use to run container
What is missing here is the SSL configuration and user authentication. SSL can be configured on TabPy
level along with basic authentication. However, in our case, we will configure SSL on the edge. In case you required to encrypt all in-cluster communication, it is preferable to configure SSL both in the container and on the edge too.
Now take a closer look at security.
Hardening a container
In the previous step, we made a few steps to ensure a secure environment for our application:
- Use secure, PIE compiled OS (like Alpine)
- Avoid OpenSSL for security
- Create a system user to run the service inside the container
- Explicitly define UID
- Prepare a container for network lockdown: download every runtime resources in advance
- Remove all build dependencies. No headers, compilers or development libs should be in the deployed container. Same for
bash
, we just don’t need a full power shell in our container. - Remove writable folders from the containers, if possible. (
TabPy
needs a writable directory)
In addition to these steps, it is a best practice to set up SELinux domains for our application, similar to httpd_t. This could give additional security, enforcing rules like “no execution of child processes” or “cannot read files from specific folders” — even if the classic Unix permissions allow it. SELinux is out of the scope of this post, but I highly encourage you to use it.
Deploy containers to Kubernetes
In minimal setup, we need a Service
and a Deployment
resources to start our pods (pod is the combination of container, storage, IP, etc.). The Service
is responsible to make our service discoverable internally and externally, while Deployment
just explains what containers we need in what setup.
To add the Service
, just type:
kubectl apply -f https://raw.githubusercontent.com/starschema/k8s-tabpy/master/tabpy-svc.yaml
For Deployment
:
kubectl apply -f https://raw.githubusercontent.com/starschema/k8s-tabpy/master/tabpy-pod.yaml
Make sure you have the following sections in your Deployment
file:
securityContext:
runAsUser: 1000 # make sure we execute things as non-root
allowPrivilegeEscalation: false # do not allow suid
privileged: false # no privileged containers
hostNetwork: false # deny accessing the host's network
Now we should see something like:
$ kubectl get pods,services -n tabpy
NAME READY STATUS RESTARTS AGE
pod/tabpy-deployment-58d6f864f9-45j2m 1/1 Running 0 14h
pod/tabpy-deployment-58d6f864f9-cplvd 1/1 Running 0 14hNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/tabpy LoadBalancer 10.100.55.81 hostname 9004:30474/TCP 2d
Now we can test our service:
$ curl http://hostname:9004/info {"description": "", "creation_time": "0", "state_path": "/tabpy", "server_version": "1.0.0", "name": "TabPy Server", "versions": {"v1": {"features": {}}}}%
All looks good.
Network Security for Pods
In most of the cases, we do not need any outgoing networking connection from our pods other than intra-namespace connections. In our specific case, we do not need any outgoing connection at all.
To deny network connections, we need the Calico network policy system to be installed on our kubernetes cluster. In case we don’t have, simply install it with:
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.5/config/v1.5/calico.yaml
Our NetworkPolicy
is fairly easy, we simply disable all Engress
connection in the namespace where our pods deployed:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-egress
namespace: tabpy
spec:
podSelector:
matchLabels: {}
policyTypes:
- Egress
Now, there are no outgoing connections from pods — not even DNS requests:
$ kubectl exec -n tabpy -ti tabpy-deployment-58d6f864f9-45j2m sh
/ $ nc -v google.com 80
nc: bad address 'google.com'
We are done with basic security/hardening, we can proceed with the scaling.
Setup Autoscaling with metrics-server
What is autoscaling and why do we need it? The idea here is to provide automatic horizontal scaling based on resource consumption. If the average CPU consumption of our containers goes up to 70% we might need to start new containers to handle the load. If the load goes down — we should downscale our services.
In order to scale up our services horizontally, we need to get CPU and memory information from our containers. While in the past heapster was enough, these days (kubernetes>1.11) we need metrics-server
to be installed. If you do not have metrics-server, the easiest way to get it with curl
and jq
:
DOWNLOAD_URL=$(curl -Ls "https://api.github.com/repos/kubernetes-sigs/metrics-server/releases/latest" | jq -r .tarball_url)
DOWNLOAD_VERSION=$(grep -o '[^/v]*$' <<< $DOWNLOAD_URL)
curl -Ls $DOWNLOAD_URL -o metrics-server-$DOWNLOAD_VERSION.tar.gz
mkdir metrics-server-$DOWNLOAD_VERSION
tar -xzf metrics-server-$DOWNLOAD_VERSION.tar.gz --directory metrics-server-$DOWNLOAD_VERSION --strip-components 1
kubectl apply -f metrics-server-$DOWNLOAD_VERSION/deploy/1.8+/
If all looks good, we should see something like:
$ kubectl get deployment metrics-server -n kube-systemNAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 9m1s
Now it’s time to define the horizontal scaling rules, like if the CPU usage is more than 70% then scale up to ten containers. I kept the minimum as two to have some basic high availability during worker node crash or rolling upgrades:
$ kubectl autoscale deployment tabpy-deployment --cpu-percent=70 --min=2 --max=10
horizontalpodautoscaler.autoscaling/tabpy-deployment autoscaled
To check the results:
$ kubectl describe horizontalpodautoscalers.autoscaling/tabpy-deployment
Name: tabpy-deployment
Namespace: tabpy-dev
Labels: <none>
Annotations: <none>
CreationTimestamp: Sun, 23 Feb 2020 13:10:11 -0500
Reference: Deployment/tabpy-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 1% (1m) / 70%
Min replicas: 2
Max replicas: 10
Deployment pods: 2 current / 2 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
This looks good, we see the current usage (1%) and the threshold (70%) to scale. In case the load will reach 70%, autoscaler will increase the number of pods to reduce the load. When the load decreases, autoscaler reducing the number of pods to the desired state.
Unknown CPU/No metrics known for pod
In case you see unknown
CPU usage and your metrics-server
emits a no metrics known for pod
error just make sure you have resources/requests
defined in your Deployment
definition:
resources:
requests:
memory: "64Mi"
cpu: "100m"
Now your horizontal scaling rules are in place, security is around acceptable, it seems you are ready to invite your users.
Conclusion
Kubernetes could be easy or complex, depending on the depth you are using it. However, to deploy our TabPy
service things were fairly easy. However, if you had any issues just drop a message, I’ll try to sort it out.