Couchbase
Published in

Couchbase

Simple dataviz with N1QL and Google Sheets.

By: Keshav Murthy

Do whatever it takes to present the data to aid analysis and thinking. — Edward Tufte

How do you create graphs like these if you don’t already have some ready-made cool dataviz tool?

You can run queries to wrangle the data to get the results. Often you want to present the data visually in pie charts, bubble charts, histograms and line graphs. If you have Tableau, Cognos, etc it’s easier to visualize. Otherwise, you’ll have to copy the results into google sheets or excel. Cut and paste of the results to google sheets is difficult, due to formatting, header-row issues, etc. However, Couchbase Query Workbench makes it easy. Next to the query results, we have a copy icon that copies the output into a tabular form that you can paste into spreadsheets such as Google Sheets, Excel, etc. This facility is available in both query workbench and analytics query workbench.

Let’s look at some sample data and charts.

  1. Calculate the types of documents and their count.

Here’s the query on the travel-sample dataset shipped with Couchbase. Click on the image to see the GIF animation.

SELECT type,
COUNT(1) typecount
FROM `travel-sample`
GROUP BY type
ORDER BY typecount

type typecount
“airline” 187
“hotel” 917
“airport” 1968
“landmark” 4495
“route” 24024

Here’s the piechart created for the data. This can be easily customized for color, data display, various labels, and legends.

Click on the image to see the GIF animation on how to get the results to google sheets to get the charts in a second.

2. Task: Find the top 10 cities with the most number of hotels.

SELECT country,
city,
COUNT(1) hotelcount
FROM `travel-sample`
WHERE type = ‘hotel’
GROUP BY country,
city
ORDER BY hotelcount DESC
LIMIT 10

Results:

city country hotelcount
“San Francisco” “United States” 132
“London” “United Kingdom” 67
“Paris” “France” 64
“San Diego” “United States” 48
“Birmingham” “United Kingdom” 36
“Los Angeles” “United States” 35
“United Kingdom” 23
“Santa Monica” “United States” 14
“Malibu” “United States” 12
“Edinburgh” “United Kingdom” 10

The only customization to the histogram here is to add the data labels to show the actual number of hotels.

2. Task: Find the top 5 cities with the most number of hotels with “gardens” in the review

SELECT city,
COUNT(1) hotelcount
FROM `travel-sample`
WHERE type = ‘hotel’
AND ANY r IN reviews SATISFIES search(r.content, “garden”) END
GROUP BY city
ORDER BY hotelcount DESC
LIMIT 5;

city hotelcount
“Paris” 10
5
“San Francisco” 5
“San Diego” 4
“Edinburgh” 3

Simply choose the “Doughnut piechart” and add the labels.

6. Bubble charts require you to create the percentage value for each row. The window function RATIO_TO_REPORT() help you to do that easily. Once you have your query, creating a bubble

SELECT country, city,
SUM(ARRAY_SUM(reviews[*].ratings[*].Location)) / COUNT(1) city_avglocation,
SUM(ARRAY_SUM(reviews[*].ratings[*].Service)) / COUNT(1) city_avgservice,
RATIO_TO_REPORT(COUNT(1)) OVER(partition by country) * 100 ratio_val,
(TO_STR(ROUND(RATIO_TO_REPORT(COUNT(1)) OVER(partition by country) * 100, 0)) || “%” ) AS ratio_percent
FROM `travel-sample`
WHERE type = ‘hotel’
group by country, city
order by ratio_val desc, city_avgservice desc, city_avgvalue desc
LIMIT 10

city city_avglocation city_avgservice country ratio_percent ratio_val
“Paris” 13.75 15.421875 “France” “46%” 45.714285714285715
“San Francisco” 15.583333333333334 16.71969696969697 “United States” “37%” 36.56509695290859
“London” 11.567164179104477 15.746268656716419 “United Kingdom” “16%” 16.105769230769234
“San Diego” 15.583333333333334 16.729166666666668 “United States” “13%” 13.29639889196676
“Los Angeles” 11.971428571428572 12.628571428571428 “United States” “10%” 9.695290858725762
“Birmingham” 14.666666666666666 18.083333333333332 “United Kingdom” “9%” 8.653846153846153
“Avignon” 10.375 13.375 “France” “6%” 5.714285714285714
16.91304347826087 17.782608695652176 “United Kingdom” “6%” 5.528846153846153
“Chamonix-Mont-Blanc” 18 29.142857142857142 “France” “5%” 5
“Nice” 11.571428571428571 15 “France” “5%” 5

Bubble chart showing avg rating by location, etc. The size of the bubble shows the %ge of hotels represented within the respective country.

This animated GIF shows how to create this chart.

7. Creating geo charts is pretty easy in google charts since it recognizes the countries and cities. You don’t need to deal with latlongs.

Select country, count(1) num_hotels
from `travel-sample`
where type = ‘airline’

country num_hotels
“United States” 127
“United Kingdom” 39
“France” 21

With the data above, simply paste into a google sheet and then create a geo chart.

8. You can also create a geo chart with markers (proportional bubbles), again, using the RATIO_TO_REPORT() window functions.

SELECT country,
COUNT(1) AS num_hotels,
(TOSTR(ROUND(RATIO_TO_REPORT(COUNT(1)) OVER () * 100,0)) || “%”) AS hotels_percent
FROM `travel-sample`
WHERE type = ‘hotel’
GROUP BY country;

country hotels_percent num_hotels
“United Kingdom” “45%” 416
“France” “15%” 140
“United States” “39%” 361

You can also customize the charts to the region, for example, the United States.

Finally, the query to generate the first graphic of this blog:

SELECT country, city,
SUM(ARRAY_SUM(reviews[*].ratings[*].Location)) / COUNT(1) city_avglocation,
SUM(ARRAY_SUM(reviews[*].ratings[*].Service)) / COUNT(1) city_avgservice,
SUM(ARRAY_SUM(reviews[*].ratings[*].Overall)) / COUNT(1) city_avgoverall,
SUM(ARRAY_SUM(reviews[*].ratings[*].Rooms)) / COUNT(1) city_avgrooms,
SUM(ARRAY_SUM(reviews[*].ratings[*].`Value`)) / COUNT(1) city_avgvalue
FROM `travel-sample`
WHERE type = ‘hotel’
group by country, city
order by city_avgservice desc, city_avgvalue desc
LIMIT 50

Keshav Murthy is a Vice President at Couchbase R&D. Previously, he was at MapR, IBM, Informix, Sybase, with more than 20 years of experience in database design & development. He lead the SQL and NoSQL R&D team at IBM Informix. He has received two President’s Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelors degree in Computer Science and Engineering from the University of Mysore, India, holds ten US patents.

--

--

--

The world’s best open source database for building scalable, high performance web, mobile & IoT applications. www.couchbase.com

Recommended from Medium

How To Do Data Science In The Julia Language

Feature Engineering and Linear Regression

Comparative Judgement is a better predictor of marks than marking!

Visualizing data through time with Shiny

READ/DOWNLOAD=) Real-World Data Mining: Applied Bu

Impact of Machine Learning and GDPR on the value of your Data

Building domain understanding into Search

IBM Capstone Project: Taspas Restaurant in Madrid

The Battle of Neighborhoods: Madrid, Spain

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Couchbase

Couchbase

The world’s best open source database for building scalable, high performance web, mobile & IoT applications. www.couchbase.com

More from Medium

PL/SQL-Triggers

On Self-Goal Settings…

Project Water W-1.