Dealing with ‘big’ data in Tableau

Manas Ranjan Kar
NLP Wave
Published in
2 min readOct 15, 2015

‘Big Data’ might be one of the most abused words in the analytics industry. While many of the experts write about it, I personally have no clue about what the whole brouhaha is all about. I have tried to keep the definition of big data simple in my mind ; voluminous data from multiple sources, tools change but techniques to analyse them rarely do.

Recently, one of our clients handed us a huge data file in CSV format, all 13.5 million rows of them. While this may not qualify as big data, it was our first encounter with data of this size. This is where the problem started.

Now, the tool of choice to handle and analyse this data was always Tableau. But with the limitations of the MS Jet Engine to connect to the CSV file, we were unable to process decimal values leading to erroneous calculations and hypothesis.

Some sleepless nights later, we figured it out. We decided to employ a DB to load our data and connect to it via ODBC connectors of Tableau. We chose MySQL for the purpose.

I am briefly outlining the steps followed;

  • Install MySQL and it’s components from a MSI Installer available on its website. The version used in the exercise was 64 bit MySQL Community Server 5.6.17. You can download the latest version at http://dev.mysql.com/downloads/mysql/
  • Configure the server parameters — Hostname (localhost), Username (mysql), Port (3306) and Password. Also, add a user giving global privileges during the penultimate stage of installation.
  • Install latest version of HeidiSQL, it’s a relatively simpler no-coding MySQL client to manipulate MySQL databases and import CSV. The version used was HeidiSQL 8.3. You can download the latest version at http://www.heidisql.com/download.php
  • Create a new connection in HeidiSQL to connect to MySQL database using the server parameters
  • Select a database, right click to create a new table. Start adding new columns and its properties in the table from the add button shown. The column names and properties need to be in sync with the CSV file you wish to import.
  • Once this is done, save the table. Go to Tools>Import CSV. Choose the file path, encoding (UTF-8). The dialogue box should look something like this
image
  • Depending on the size of the data, it take some time to importing and loading it into the database. In this case it took nearly 25 minutes to convert and load 6.2 GB of data.

Open Tableau Desktop Professional, Connect to data using MySQL parameters and load the table from the given databases. You may want to import all the data for faster processing. Done!

Problem Solved. Learning from the experience? Develop more robust ETL processes for the future. But hey, working for a startup is surely fun and rewarding !

--

--