Tableau announced that you can run their server on Kubernetes back in 2021. Now that it’s been a few years, let’s take a look and see how it is to setup.
The Docker image
Tableau Server in a Container ships as a tarball download
Okay, so we have to build the docker image ourselves, presumably using the downloaded docker file. Not a big deal, a single dockerfile is usually very simple.
which includes shell scripts that give you the ability to create Tableau Server Docker container images in your local environment
➜ ~ tree Downloads/tableau-server-container-setup-tool-2022.3.1
Downloads/tableau-server-container-setup-tool-2022.3.1
├── EULA
│ ├── COPYRIGHTS.rtf
│ ├── Commercial_EULA.txt
│ └── NOTICES.txt
├── README.txt
├── build-image
├── build-upgrade-image
├── build-utils
├── configure-container-host
├── customer-files
│ └── setup-script
├── image
│ ├── Dockerfile
│ ├── docker
│ │ ├── alive-check
│ │ ├── config
│ │ │ ├── bootstrap
│ │ │ ├── config.json
│ │ │ ├── tabsvc.yml
│ │ │ └── tsm-commands
│ │ ├── install-process-manager
│ │ ├── rpasswd
│ │ ├── run-tableau-server
│ │ ├── server-ready-check
│ │ ├── single-service
│ │ │ ├── clone_artifact_modifier
│ │ │ ├── independent-backgrounder-additional-config.yml.tmpl
│ │ │ ├── init
│ │ │ └── tsm_docker_utils
│ │ │ ├── README
│ │ │ ├── after-install-common
│ │ │ ├── after_install_service
│ │ │ ├── configure_service.bash
│ │ │ ├── graceful_shutdown.bash
│ │ │ ├── initialize_notify_dir
│ │ │ ├── install_service.bash
│ │ │ ├── run_container
│ │ │ ├── shared_utils.bash
│ │ │ └── status_check.sh
│ │ ├── stack-traces-from-coredumps
│ │ ├── start-process-manager
│ │ └── upgrade
│ │ └── upgrade-tableau-server
│ ├── independent.dockerfile
│ ├── init
│ │ ├── fake-chrpath-1.0-1.x86_64.rpm
│ │ └── setup_default_environment.bash
│ ├── shared_ind.dockerfile
│ └── upgrade.dockerfile
└── reg-info.json👀 A bit more than I was expecting, but let’s continue. Exact setup steps can be found here. I’ll summarize the setup for this blogpost, but if you’re intending on actually building the image you should read the link in detail.
Tableau may not already have the drivers we need to connect to our data sources, so after we check the driver page we should put the driver installation code in customer-files/setup-script. The image won’t have admin rights by default, so you may want to install other tools ahead of time as well. Remember to use -y to install without prompting if you run this in your CI!
You will need to complete the registration form, which is straightforward.
After that you will need to download the Tableau server rpm, which will be copied into the image.
Now Let’s build it:
$ ./build-image --accepteula -i ../tableau-server-2022-3-2.x86_64.rpm
./build-utils: line 53: declare: -A: invalid optionOoops. This is a common MacOS error, and easily fixed. You may need to run bash build-image instead of ./build-image after you update bash. Let’s run this again:
$ ./build-image --accepteula -i ../tableau-server-2022-3-2.x86_64.rpm
#12 25.27 Error:
#12 25.27 Problem: conflicting requests
#12 25.27 - package fake-chrpath-1.0-1.x86_64 does not have a compatible architectureThis is probably because I’m running this on arm64 architecture machine. It works on a x86 machine. Now that we have the image, let’s check the size:
$ docker image ls | grep -i tableau
tableau 9.12GBWat. (・_・;)
Best practice is to slim the container size as much as possible, and I’m talking down to an ideal of ~5MB with distroless containers. It’s okay if you don’t shrink that far, of course, but 10 gigabytes is…. something else. Generally speaking, the larger the container, the larger the surface area for vulnerabilities. A security scan against this thing is not pretty, although to be fair, they rarely are. The other downside here is this image is so large that you may need to update your Kubernetes node volume size. A few cached versions of this could overflow your storage.
Let’s dive into why this container is so large. Literally, let’s dive into it via https://github.com/wagoodman/dive 😉. It shows us what layers contribute what portion to the image size.
- The base OS is ‘only’ 204MB.
- A tableau setup script installing dependencies adds 174MB.
- The rpm file copied into the image is ~3 gigs
- When the rpm file is installed, it adds a whopping ~6 gigs
- Other scripts add a small amount of space
The rpm file is leftover after installation and presumably no longer needed, so if Tableau employed one of the tricks in https://stackoverflow.com/questions/26306059 they might be able to save ~3 gigs. Switching to a smaller OS or minimizing dependencies could also save a bit of space. But I digress.
The Kubernetes Deploy
Now that we have the image, let’s deploy it via this repo. The first thing we see here is “support level: community supported." Not good, but Tableau on docker is supported, and Kubernetes runs on docker†, so you can make your case to the support agent and maybe you’ll get lucky. If you think you’ll skip the support agent by posting a GitHub issue, don’t bother. The repo is not actively changed or watched over.
The repo doesn’t have a helm chart for us to use, but it does have a yaml file. Unfortunately, it’s not deployable out of the box. The CPU and memory requirements are far too low, which I can say from experience leads to a very slow and eventually failing install. You’ll need to up it to around 16 CPU and 64Gi memory. Unless you have a beefy computer, this isn’t something I’d bother trying to run locally. I would also up the EBS storage to the hundreds of gigs, as Tableau will quickly fill that space up with log files. Finally, they chose a deploymentrather than a statefulset , which is odd given that Tableau is a stateful service that relies on a EBS volume. Indeed, they literally recommend using a statefulset in their readme.
Now you can deploy it. It will take a long time for the pod to start, but once it does you can exec into it and run tsm status -v to see that status of the service. It will report back 20 or so different microservices.
Oh no
This explains why the container is that big — a natural consequence of trying to pack an entire business architecture into a single container. It makes the system a pain to debug. Each service is its own knowledge domain, with it’s own log file and peculiarities. Because there’s multiple log files they can’t write everything to stdout, so kubectl logs pod/tableau will not show you everything. You have to either exec into the pod and filter through logs with your unix skills or setup a forwarder for all the log files. You will also want to setup log rotation to avoid the logs eating up all the space in the EBS volume. I think you can guess at this point that multiple services in a single container is a antipattern, similar to having a large image, although in some ways there are one and the same problem. If you split up the services into one per container you would naturally have smaller images and all logs going directly to stdout.
At this point you may be going — “Okay, so a bit complicated, but back in the good ol’ days I used to cd around the server and grep through logs all the time, what’s the big deal?” From a classic sysadmin perspective you are correct, this is not that bad. Where my frustration came from is that I had gotten so used to the wonderful world of out-of-the-box helm charts with simple ephemeral containers that I forgot my days manually managing EC2 servers. If you come into Tableau prepared and with the right perspective, I’m sure your experience will be better.
Maintenance
Once the service is deployed it’s relatively stable. Changing the configuration is a pain, because certain options require the server to reboot, which takes about 10 minutes. We also ran into a memory leak, which you can solve naively by rebooting or take to Tableau support. My experience with Tableau support was a mixed bag. Expect them to call you back a day or month later. For backups you can either use the tsm utility to backup to different file storage location or backup the associated EBS volume using your preferred method.
To upgrade you need to build a special upgrade container and run that. Take note that the hostname must remain the same. It’s a pain compared to the dead-simple upgrade of ephemeral containers, but it works.
Conclusion
Deploying Tableau on Kubernetes is not fun, but it’s doable. For small to medium organizations they recommend using their hosted solution, for good reason.
Foototes
† Technically, Kubernetes can use any CRI runtime.
