Analyse data with Dremio and GoodData’s Cloud-Native analytics platform

Martin Svadlenka
GoodData Developers
7 min readFeb 4, 2022

If you are interested in GoodData.CN, please contact us. Alternatively, sign up for a trial version of GoodData Cloud: https://www.gooddata.com/trial/.

GoodData is a cloud-native analytics platform that enables you to build and run scalable real-time analytics — with built-in elasticity reacting to data volume and user traffic — in modern environments such as public, private, and hybrid clouds.

In this tutorial, I would like to guide you through how to simply deploy GoodData.CN Community Edition and connect it to Dremio Community Edition on your premises, and start building analytics on top of multiple data sources. As sample data, the tutorial uses Swiss sample car insurance data.

Installation

First, you need to install GoodData.CN Community Edition and Dremio Community Edition.

In order to do so choose from the following two options.

Option 1. Deploy GoodData.CN & Dremio through a Docker compose (recommended)

Read the “deployment” section in the following GoodData-Dremio integration documentation, and proceed with installation via the given script.

The advantage of this option is that it includes the installation of drivers of databases that are not included by default in the Dremio image.

Option 2. Deploy GoodData.CN Community edition & Dremio Community edition separately.

Deploy both images separately:

GoodData.CN

Dremio

Datasets Set-Up in Dremio

Log into the installed Dremio Community Edition that is located on URL: http://localhost:9047/ by default. In order to log in, there has to be an account created in Dremio. To start working with data, you need to create your workspace.

Once you have your workspace created, there is an option to start working with sample sources (CSV-based sample data) or create a new source by connecting to databases. For the purposes of this tutorial, we will use the following data structure, based on fictive car insurance data from Switzerland. You can find the required data sets in the following repository.

As a first step, all JSON files need to be loaded into Dremio as a data source.

Once all the files are loaded, open each one of them and proceed with the following steps:

Change the data types of columns to correspond with the column’s purpose. Fields to be changed:

Claims

  1. Claim_date to Date type
  2. Claim_amount to Float type

Coverage

  1. Coverage_created_date to Date type
  2. Coverage_cancelled_date to Date type
  3. Coverage_annual_premium to Float type
  4. Coverage_lifetime to Integer
  5. Coverage_risk_score to Float type
  6. Coverage_deductible to Float type
  7. Coverage_accident_probability to Float type

Region

  1. Region_crime_rate to Float type
  2. Region_safety_scale to Float type

Save the changed data tables as new data sets into the workspace you created upon login into Dremio.

Similarly, there can be any other JSON data sources combined into datasets in order to be read by GoodData.CN as available datasources.

Initial Setup of GoodData.CN

Once Dremio data sets have been prepared, log into GoodData.CN with the following credentials:

  • email: demo@example.com
  • password: demo123

Default URL for GoodData.CN after default installation is http://localhost:3000/

If you need help with installation or other basic steps in GoodData.CN I highly suggest following the Getting started tutorial.

After logging in, the first step is to create a workspace. A workspace represents an endpoint or tenant environment, where analytical objects (dashboards, insights, semantic model, or metric) are located.

Once the workspace is created, your local Dremio has to be registered as a new data source in GoodData.CN. This is done through an API, for which you can send the following request with the relevant parameters and URL.

URL: http://localhost:3000/api/entities/dataSources

Headers:

Request body:

Build a Semantic Model on Top of Dremio Datasets

As a next step, the semantic model has to be prepared on top of Dremio datasets. A logical data model (LDM) is an abstract view of your data in GoodData Cloud Native (GoodData.CN). The LDM is a set of logical objects (datasets and Date datasets ) and their relationships that represent the data objects (and their relationships) in your database through the physical data model (PDM). The LDM is connected to and is based on the PDM, while the LDM is tied to the workspace. Within the workspace hierarchy, the LDM of a parent workspace is shared with the child workspaces.

Navigate to the modeler by clicking on “Connect data” in the workspace you have created.

In the modeler, you will find the previously registered Dremio-demo-data data source.

Scan the registered data source with the following settings. Note: A workspace already prepared in Dremio must be picked in the scan combobox “SCHEMAS”.

After scanning the Dremio registered data source, the modeler will load datasets you have already prepared in Dremio onto the canvas. Wherever the Date or DateTime data type is set in the Dremio dataset, the Date dataset is also set in the GoodData.CN modeler in order to provide date dimensions in analytics objects.

Datasets need to be connected together with the primary keys set in the following way. The connection can be made by simply dragging and dropping arrows from one dataset to another. Once the connection is set, the primary keys have to be set for each data set. Navigate to the data, click on “More..” → “Set primary key” and choose the proper column to define the primary key as the following:

  • Product — Product ID
  • Region — Region ID
  • Car — Car ID
  • Claim — Claim ID
  • Coverage — Coverage ID
  • Customer — Customer ID

Once set, publish your data model. To learn more about how to build semantic data models, please visit GoodData University — Understanding logical data model.

Build Your First Analytics on Top of Dremio Datasets

Once a semantic model is created and published, you can start building your first analytical objects.

Go to the tab “Analyze” and create your first Insight by dragging and dropping data fields from the left panel to insight parameters (Measures, View by, Stack by …).

In this example, we will create a bar chart analyzing the total claim amount made via car insurance, sliced per car make. Proceed as shown on the picture, set up an insight name, and hit save.

Build Your First Calculated Metric on Top of the Semantic Model

As a next step, we will create a calculated metric on top of a semantic model previously created by scanning Dremio datasets.

First, go to the “Metrics” tab and click on “Create metric”. This will take you to a metric editor that enables the writing of metrics using the GoodData query language called MAQL. To learn more about MAQL, visit our online course at GoodData University — Getting started with MAQL

We will create a premium revenue metric as an example. This calculated metric represents total revenue made from premium coverages during 1 year on top of all coverages. This can then be analyzed from customer, car, or even product perspectives.

Create premium revenue metric in the editor:

Once a metric is created, it can be used for creating analytical objects — insights — such as any other data field. Also, metrics are fully composable, which means that you can calculate one metric by using other metrics in the calculation formula — MAQL expression.

So, for example, use the premium revenue metric in a bar chart, slice it by coverage status, and store it in a new Insight called “Premium revenue by Coverage status”.

Insights can be organised into dashboards in the “Dashboards” tab.

Organise Insights into a Dashboard

As you have previously created 2 insights, let’s now add them into a single-overview dashboard. Navigate to the Dashboard tab in the top panel and click “Create dashboard”. Name your new dashboard (e.g., as “Overview”) and drag and drop the insights from the left panel into the dashboard layout.

Also, do not forget that the dashboard time dimension filter is set by default to show data for the last month. You need to change it to all time to see the whole range of the data.

Enjoy your first dashboard built on top of data from a Dremio lake-house!

Please share any feedback or questions on our community forum or community Slack channel.

--

--