Visualizing Airline Data with Cube.js, BigQuery and Chart.js

Published in

Geek Culture

7 min readOct 26, 2022

Visualizing or Representing your data comes out as a very important item to add to your project’s priority list since we get so much more clarity on the information in the form of graphs, charts, etc, with it, not only does it help us interpret large chunks of data but also helps us analyse the trends and patterns through data.

Because with great power, comes great responsibility! We’re going to make our lives easier by leveraging some amazing tools out there that’d reduce our development time to mere minutes.

In this tutorial we’re going to look at:

How we can set our own custom data on Google’s BigQuery without any additional cost
Learn how to connect Cube.js to provide us with the backend support between BigQuery and our frontend application
Add Chart.js as a dependency to build beautiful Visualizations on the Web front

What is Google BigQuery?

BigQuery is an enterprise data warehouse that solves this problem by enabling super-fast SQL queries using the processing power of Google’s infrastructure. It is Serverless meaning storing your data cheaper and scaling it faster. BigQuery can handle a lot of data very fast and at a low cost. Other than that it also has built-in integrations that make building a data lake in BigQuery simple, fast, and cost-effective.

Centralizing your data to allow for auto-integrations with Google Cloud’s machine-learning tools for advanced data science reports. It’s also a one-click integration with Data Studio means visualizing processed tables is simple and fast. ETL solutions like DataFlow and DataProc take the overhead out of data transformation.

BigQuery also has a free usage tier: you get up to 1 TB of processed data per month and some free credits to spend on Google Cloud during the first 90 days.

Setting up the Airline Analysis Data on BigQuery

Although BigQuery provides us with tons of free public datasets without any restrictions to the Google Cloud users, we are going to use our own dataset in this tutorial.

To get started, visit the Google Cloud Console and create an account if you don’t have one already. Post account creation, visit the New Project creation page and create a new project in which we’ll have the Airline Analysis data set.

Once you have finished the basic setup, it is recommended to create a service account to restrict your API’s access to run workloads and add authentication. While creating the service account, you’ll need to specify the roles for the same, the only two roles needed for read-access to the dataset are BigQuery Data Viewer and BigQuery Job User. Once this is done, the final step is to add an authentication key to this newly created user. Click on the ... button under Actions to the right and select Manage keys. Add a new JSON key from this section and it’d be automatically downloaded in your local system. Make sure you keep it safe and don’t lose it.

Now that you have done all the configurations needed to create and read the BigQuery dataset, it's time to actually download the Airline dataset from here.

Once downloaded, visit the BigQuery Dashboard and click on your project ID from the explorer. On the expanded section to the right, you would see an option to Create Dataset. Select that option and provide the Dataset ID to whatever you want to name the dataset, preferably call it airline to recognize it better, leave the other settings as default, tweak it only if needed.

You’ll be able to see your newly created empty dataset in the explorer section, focus on the new empty dataset just created and click on the + icon to add tables to it. Upon clicking you’d be able to see the Create table option, choose Upload as a preference from the Source dropdown and upload the CSV airline dataset that you downloaded. Cross-check if the project name and dataset name are set to your newly created project or not. Give the table a name and finish table creation by clicking on Create table , the blue button at the bottom.

And that’s it, your data is all set.

Bootstrap your project with Cube.js

Now that we have our data set in BigQuery we’d need to be able to use that data, that’d mean direct interaction with BigQuery and SQL queries. There’s nothing wrong with SQL; it’s a great domain-specific language, but having SQL queries all over your codebase smells like a leaky abstraction — your application layers will know about column names and data types in your database.

To improve upon this, we’ll use Cube.js as an open-source analytical API platform of our choice. Cube.js provides an abstraction called a “semantic layer,” or a “data schema,” which encapsulates database-specific things, generates SQL queries for you, and lets you use high-level, domain-specific identifiers to work with data.

Navigate to your project directory and bootstrap your project with Cube.js to install all the necessary drivers and dependencies for analytics as well as to connect the database

npx cubejs-cli create airline-analysis -d bigquery

Cube.js utilizes the Cube.js CLI to create new projects with the -d flag to connect the respective database. Here is a list of supported databases by cube.js. While bootstrapping it also creates a .env file that contains the project configuration including database type, Project id, Project environment, etc that is required by the backend.

Here is how the .env file looks like for our BigQuery project.

CUBEJS_DB_BQ_PROJECT_ID=PROJECT_ID
CUBEJS_DB_BQ_KEY_FILE=key.json
CUBEJS_DEV_MODE=true
CUBEJS_DB_TYPE=bigquery
CUBEJS_API_SECRET=SECRET

Substitute your PROJECT_ID with Google Cloud Platform’s newly created Project ID, you find it upon clicking on the Project Switching button at the top of the page. Also, copy the service account key that was previously downloaded in your system into your project folder and rename it key.json for easy understanding.

Run Cube.js backend service locally, type the following in your console (Make sure you are navigated to the root project directory)

npm run dev

You’ll notice a message specify that the server is running on localhost:4000 by default. Go ahead and open it to see your dataset as well as Cube.js Developer Playground.

Define the Data Schema

The data schema is a high-level domain-specific description of your data, we’d need to define our data schema first in order to explore the data. With Data schemas you won't need to write SQL queries, rather rely on the Cube.js query generation engine, making it much easier to play with data.

Our data schema would be present inside the schema folder as schema/Airline.js

Do change the sql: 'SELECT * FROM <Your Dataset>.<Table Name>' to your own Dataset and table name if you have changed it manually.

In this schema, measures are the numerical values to be calculated, while dimensions are the attributes for which the measures are calculated. Dimensions can have string or time type. BigQuery functions like CAST and TIMESTAMP can also be used in measures and dimensions.

Playing around with the Data Visualization

Considering your server is already running, you can visit localhost:4000 to check out the Cube.js Developer Playground where you’d be able to find the Build tab. This tab constitutes some amazing data selection options on top of which visualizations would be visible, simply click on Measure and select a metric to measure your data on and choose as many Dimensions you want, you should be able to see/filter your own visualizations with all the different options present.

Let’s generate our frontend application now with Cube.js template as a base. Simply navigate to the Dashboard tab and select Create custom application with React and Ant Design, and Chart.js as our Charting library. It will take a few minutes to create the dashboard-app folder and install all the basic dependencies. Once it is finished we’ll add our own Charts and customize our frontend application.

Navigate into the dashboard-app folder and start the frontend server using the following command:

npm start

You’ll be able to see the server running in localhost:3000

Our application will have the following structure

App will be the master controller of the application
ChartRenderer will be responsible for rendering the different charts
DashboardPage will be the main UI page that’ll have all the chart data (This page will call ChartRenderer to render the different type of Charts)

App.js

The App component will be the master controller and would look like this, we’re using the CubeProvider to provide the cube.js API throughout the application as to query/use the data anywhere.

Dashboard Page

For this example application, I have divided the Dashboard into Grids to display Line , Area , Pie Graph and Table. Thus we can use Ant Design’s Grid to achieve this.

Rendering Charts

The ChartRenderer component is responsible for rendering your charts, in this example, I’m going to show you the code for rendering only the line chart type to keep it simple and easy to understand.

The WhichQueryRenderer utility finds out the chart type and returns the particular Query with the appropriate measures and dimensions . It internally uses the renderChart method which renders the Line chart from Chart.js. We pass it labels and datasets as a parameter from the resultSet derived from the query.

And that’s it, we’re ready to check our first Line Chart in our dashboard.

Make sure you remove the other chart types and their grids from the DashboardPage component.

We’ll use the same Chart Renderer component to query more chart types and supply the chart to our DashboardPage component which will render them in the UI.

The full source code is available on GitHub.

Conclusion

If I missed out on any point or you want to discuss something feel free to leave a comment down below, I’d hop in ASAP. 🌟

Lastly, Thank you for making it this far down into the article and showing your interest in React. You are amazing and keep making a positive difference every day. Peace out. ✌🏼