Building real-time data visualization of data in Couchbase Analytics with Tableau
By Sachin Smotra
In this piece, I will cover how to set up a Couchbase Analytics cluster in under 5 clicks and create a real-time visualization dashboard with Tableau.
Couchbase Server 6.0 is a hybrid NoSQL database that supports operational and analytical workloads. Couchbase Analytics in Couchbase Server 6.0 brings “NoETL for NoSQL” and enables users to run ad-hoc analytical queries on JSON data in their natural form — without the need for transformation or schema design — by leveraging a massively parallel processing (MPP) query engine.
Every enterprise has already invested in a visualization tool and a critical need is for them to leverage existing investments and not have to re-train their business reporting and dashboarding teams.
I’ve always believed that the proof is in the proverbial pudding when it comes to analyzing and visualizing JSON data in real time and will spend the next section showcasing how Couchbase solves this challenge.
Demo scenario
I recently had the pleasure of creating a product demo inspired by one of the early adopters of Couchbase Analytics.
The scenario is built around a fictitious company managing a fleet of cars. Each car sends operational data to its mothership. The operations team is monitoring the incoming data being sent by cars in transit with system alerts configured for any faults in critical indicator readings . If any of the indicators were providing faulty readings, the challenge was to analyze the incoming data without impacting the ability of the cars to send data. The icing on the cake was to use the Tableau visualization tool to create a dashboard that provided a real-time view of the situation on the ground.
Data model
As a part of the demo I created a JSON data model to represent real-time information being sent by each car on the road. As is the case with most JSON data models, nested objects and arrays are very much a part of the data being sent back to the car. For example, I have modelled the tire pressure readings as an array — if the fictitious car company started manufacturing bikes with 2 wheels or trucks with 6 wheels, the data model would not need any changes.
Try it out in 5 clicks: Setting up a cluster with Couchbase Analytics
If you are new to Couchbase, you can download Couchbase Server 6.0 and try this on your own. You can choose to install Couchbase on single machine or install a cluster. The instructions below are for adding a new node to a cluster. If you are running everything on a single machine, please ensure that data and analytics services are running on the node.
A subset of the dataset used in the demo is available for download. You will need to download and extract the demo dataset.
Use the “cbimport” utility to import this dataset in your own Couchbase 6.0 cluster. The command to do this on the Mac is:
/Applications/Couchbase\ Server.app/Contents/Resources/couchbase-core/bin/cbimport json -c <cluster_host_name_or_IP> -u <username> -p <password> -b cars -f lines -d file://<unzipped_list.json_file> -g “#UUID#”
For your environment, please refer to the cbimport docs.
Now that you’ve got the readings from the cars on the road available in the operational cluster, let’s add an analytics node to the cluster to start exploring and analyzing the readings being sent in real time. You’ll need to login to Couchbase admin console to follow the steps below.
- Click on “Add Server” on the top right corner of your screen.
- You’ll need to provide the details in the dialog as follows:
- Hostname/IP Address
- Username
- Choose the Analytics service ( “Option + click” on the mac will save you a bunch of clicks)
3. Click on “Add Server” button
4. Choose the amount of memory to assign to the Couchbase Analytics node
5. Finally click “Rebalance”
In 5 clicks, you have added a brand new service Couchbase Analytics to your cluster. Now let’s make the operational data available for analytics by creating a shadow dataset.
create dataset on cars where `type`=”telematics”
connect link Local
By running the above statements in the Couchbase Analytics workbench, you have now created a shadow dataset for data analysis and exploration.
Data exploration
Let’s start exploring the data. If I were the operations manager, I would like to know if the problem is widespread and if it affects more than one type of car.
Total # of cars with TPMS ON
select * from cars
where TPMS=”ON”
limit 1000
Types of cars having with this condition
select ModelType, count(*) as count from cars
where TPMS=”ON”
group by ModelType
Sample result
[
{
“count”: 1052,
“ModelType”: “Compact car”
},
{
“count”: 8640,
“ModelType”: “Convertible”
},
{
“count”: 1098,
“ModelType”: “Coupe”
},
{
“count”: 1106,
“ModelType”: “Hybrid”
}
]
If you are a SQL developer, the queries above should be familiar. Couchbase Server 6.0 enables analytics teams to bring their existing SQL skills to the schema-less and nested world of JSON data. Learn more about N1QL and SQL++.
Now let’s rule out false positive — there may be a situation where the TPMS indicator may be sending a faulty reading but the actual tire pressure values might be ok. The actual tire pressure readings are being sent as a JSON array — let’s check if they are actually low.
Are there any false positives?
select * from cars
where TPMS=”ON”
AND (EVERY tp in cars.TirePressure SATISFIES tp > 30)
limit 1000
The above query returns those values where the TPMS indicator is “ON” but the actual tire pressure is above 30 psi which is the safe limit. In case you didn’t notice, the analytics engine is working off the same JSON arrays modelled in the application- there is no transformation of data which is analyzed in its natural JSON form. #NoETLforNoQL
SQL driver setup
Now that you’ve started exploring the data, let’s visualize it in a real-time dashboard. Most BI tools deployed in the enterprise require a SQL interface; Couchbase works with CData to provide a SQL interface. To try this out, you can download the CData driver.
In this blog post, I will create an ODBC connection to connect with Couchbase Analytics. I am using a Mac and used the ODBC Manager tool to define a System DSN with the following properties:
Creating a real-time visualization in Tableau
Let me now walk through the steps of connecting Tableau with Couchbase Analytics.
- Open Tableau desktop application and choose “Connect to Other Database (ODBC)”.
- Choose the option to connect using a DSN (data source name).
- Choose the DSN created in the previous step.
- Click the “Sign In” button and you will navigate to the Tableau workbook interface.
5. On the left side of the screen, select “CData” as the database.
6. Click on the “Select Schema” dropdown and click on the search icon and choose Couchbase.
7. Click on the search icon in the table section
8. Create a workbook and choose the dimension — model type and measure — count of distinct VIN to create a simple graph
9. You can now enhance the basic dashboard to change the visualization and add filtering capabilities.
Achieving real-time operational analytics is a business imperative, yet in doing so organizations face such obstacles as:
- provisioning data in legacy data architectures taking weeks, or even months;
- a lack of skills required to modernize within their traditional IT department
- difficulty building business cases for modernization in the absence of a fast and direct return on investment
- limited insight due to the complexity of custom reporting and lack of operational dashboarding.
Couchbase Analytics addresses these concerns and makes it really easy to run hybrid operational and analytical workloads in a single Couchbase cluster. The hybrid architecture in Couchbase enables real-time analysis of JSON data generated by operational applications without the heavy lifting of data lakes, data warehouses, and complex ETL processes.
Sachin Smotra’s career spans more than 15 years building software products across various domains including Java Enterprise software, DRM Solutions for mobile games and web conferencing. As Director Product Management at Couchbase, he is a hands-on product leader responsible for Couchbase Analytics, Mobile & IOT product lines including evangelizing the product strategy and vision with customers, partners, developers and analysts.