Mining your Tableau logs with Apache Drill

Tamas Foldi
Aug 23, 2019 · 7 min read
Image for post
Image for post

Tableau Server and Desktop logs each and every action you perform. The log data is a gold mine for people eager to understand what is happening under the hood and why. However, there is no easy way to query multiple log files and filter for the relevant information only. Tools like LogShark require Windows installation and complex infrastructure and they are not even real-time. Using command line magic to parse logs (they’re in JSON format) with tools like jq isn’t really convenient as the different entries are distributed across multiple log files. So how are professionals are doing this? Let me explain.

Tableau Log file format

Exciting things are stored in Tableau’s JSON-based logging format, called native logs. Desktop, Server, Prep, and its processes dealing with data or visualizations use this format. A typical log line looks like:

With the tool jq we can quickly visualize this structure in a more human-readable way:

As we can see, all log entries are JSON objects separated by new lines. All entries have ts (timestamp), pid (process id), tid (thread id inside the process), sev (severity), req (request identified), sess (Server vizql session or Desktop session), site (site in Server), user (user in Server) and log keys (k) and (v) values. In some circumstances, we have ctx for contextual information (like who called a specific service) and a for “ART Logging” which logs CPU and memory counters. These last two examples deserve a separate post, so I’ll focus on the basic JSON format.

For us, keys and values hold the information we need: what type of operation (key) produced what log message (value). The log message itself is also a nested JSON object; different keys provide different value objects in various structures.

Both Tableau Server and Desktop store information in multiple files, adding complexity. Multiple Desktop instances use multiple log files like log.txt and log_1.txt.Hyper logs its operations to hyper logs and yes, to multiple files. Complexity increases when we want to see the specific hyper log entries for a specific Desktop or Server session. We need to switch between multiple logs quickly without ingesting them in real-time into additional databases like Splunk or Logstash.

Apache Drill — SQL for everything

Drill is a “Schema-free SQL Query Engine for Hadoop, NoSQL, and Cloud Storage” and SQL engine that allows us to SQL query everything, loosely defined JSON structures in particular. Its installation is fairly easy on every operating system as it is written in pure java. Let’s see how can we use it on Tableau Desktop and Server.

Apache Drill on OSX

The installation is straightforward, we just need homebrew to install drill:

We just need to start the standalone version:

First, we need to tell Drill where the logfiles are. Drill comes with a configuration UI by default. We can graphically set the storage configuration from there.

  1. Navigate to http://localhost:8047/storage
  2. Click on DFS’s “UPDATE” button
  3. Add the following section to the configuration JSON object:
Image for post
Image for post
This how it should look like after adding Tableau Desktop’s Log directory as “tableau” workspace.

We should also set two system settings at http://localhost:8047/options as well:

Enable union type will allow Drill to dynamically change data type for the v column while disabling number conversions speed things up. Finally, let’s try it:

Image for post
Image for post
Executing our first query on the log.txt flawlessly.

We can also go inside the v JSON object and directly query its attributes. Also, selecting records from multiple files is easy. Here, we select the ten longest-running queries using tabprotocol.

Now comes the fun part. I want to see all tabproto logs related to that to one of my bigquery session:

Image for post
Image for post
Drill from vizql logs to tabproto by process id and protocol id

Without ingesting anything, defining a schema or any hard work, we were able to drill from one log file to another, by using protocol id and Desktop process id. Anyone who has spent time with Tableau log forensics will appreciate this method.

Tableau Server and Drill

Things get serious when performing the same exercise on Tableau Server; more processes, more directories, and obviously more logs.

First, download the Drill package from https://drill.apache.org/download/ and make sure you have JDK installed on the server.

This time, instead of using the UI to add a storage configuration we will register our own plugin as tableau in conf/storage-plugins-override.conf.

Don’t forget to set up union and all text mode in conf/drill-override.conf :

We are ready with the configuration. After starting the Drill shell, we can immediately switch to this new workspace:

How about reading some vizql logs? Works like a charm.

Image for post
Image for post
We can query every Tableau Server logs — no ingestion required, everything is in real-time

Let’s check it, like which Dashboard open event required the most CPU resource?

First, we should find the k value for end bootstrap (boostrap is vizql command for opening a visualization):

Please note how easy it is to aggregate values inside logs — again, without loading it to any kind of database. Now select the top CPU usage across sessions:

Well, it seems my top workbooks require about 1300ms user CPU time. But what are these sessions? Yes, you can simply select for this vizql_session_id in the logs and see what exactly happened, what were the most CPU intensive steps in rendering:

From here, we can make assumptions like we have tons of data queries and zones to render.

Tableau Server Cluster with Distributed Drill Service

Well, this deserves a different post but here is the short version. Drill has a distributed service called Drillbit that can run queries across multiple nodes, slurping logfiles from all the workers. It does not require anything other than drill and zookeeper. Usually, I use Tableau’s own zookeeper as a coordination service for Drill.

Summary

Using SQL to traverse between logs and parse nested JSONs is just simply cool. With tools like Apache Drill or Dremio and small overhead, administrators and developers can easily dig into the Tableau Logs — on their Desktop and on their Servers.

If you have any questions or comments, or interested in Tableau Server forensics or performance optimization, just drop a line.

Starschema Blog

Data contains intelligence that can change the world — we…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store