Finding answers with Metabase

Thaynara Santos
BTG Pactual Developers
6 min readJun 10, 2020

--

With insights, questions and dashboards

Yes, at the moment, I am the worse choosing these medium cover images, It’s a challenge not to use abstract art, so for this article I chose the metabase home page, I don’t know… I find it inviting.

Let’s go to what matters!

What is it?

Metabase is an open source data analysis tool with quick configuration, making it possible to create customized visualizations, where the proposal is to be very intuitive to serve the majority of users, it has an active community on GitHub, the documentation is very complete and is possible to connect with several databases (BigQuery, Druid, Google Analytics, H2, MongoDB, MySQL, PostgreSQL, Presto, Amazon Redshift, Snowflake, Spark SQL, SQLite, SQL Server).

How we use?

Environment

There are several options for installing metabase, but we use it with docker.

To make a test on your local machine is very simple, you only need to execute the following command:

docker run -d -p 3000:3000 --name metabase metabase/metabase

And access: http://localhost:3000/setup

If the database you are going to connect to the metabase is at your localhost, you need to execute the same command, but use the net=host parameter, with that the metabase stops looking at the localhost inside the container and starts looking at the localhost of your host.

docker run -d -p 3000:3000 --net=host --name metabase metabase/metabase

To avoid lose the data you have already configured (information from the metabase itself and not from your application, such as user registration, permissions, configuration of dashboards, etc.), if the container is turned off, it’s possible to point the metabase to a volume on your machine or use a database. In our case, we need to share the metabase information, so we chose to deploy using the database storage option, which can be used through the following docker-compose:

version: '2'
services:
metabase-app:
image: metabase/metabase
ports:
- "3000:3000"
environment:
- MB_DB_DBNAME=???
- MB_DB_HOST=???
- MB_DB_PASS=???
- MB_DB_PORT=???
- MB_DB_TYPE=???
- MB_DB_USER=???
- JAVA_TOOL_OPTIONS=???

The JAVA_TOOL_OPTIONS parameter helps to limit the memory consumption of the application according to the memory consumption allowed for the container (more information here).

Setup

After installation the configuration is very simple, you only need to create your user account, which will be used as an administrator, fill in the information to connect with the chosen database (remember to fill in the host and the port, even if it appears, because they are only suggestions), choose how your data will be synchronized and that’s it, you can start using the metabase.

The metabase has the option to always synchronize, so as not to burden the application it may be better to connect the metabase to a read-only basis or disable this setting and synchronize only when necessary.

Home

This will be your home page, if you have not connected to any database in the configuration step, the metabase provides a sample dataset.

X-rays

They are insigts generated by the metabase itself for your tables, it helps you to have an idea of ​​what could be used in your dashboards, it is a cool tool to do an exploratory analysis of the data.

Sample Dataset

I can’t use examples of real data from our systems so that is another example using the spotify dataset that was added by the database connection configuration.

Ask a question

There are three options for building views with some differences between them, but all have options for filtering, summarizing, sorting, graphics for responses, downloading and alerts. With them it’s possible to create visualizations using one or more tables that can be used in the creation of other questions or dashboards.

Simple question: It can be used by users who have no knowledge of SQL.

Custom question: In this type the user already needs to have some knowledge about joins between tables, but it’s still very intuitive.

(chart example)

Native query: Here the user needs to know how to create queries using SQL.

(chart example)

Dashboards

It works as an aggregator of questions from a given context, you can create the empty dashboard and then add the questions created through Ask a Question or those generated through X-Ray, when a question is created the metabase itself asks if you want to add to a dasboard.

To create a dashboard:

The visualization of the dashboard is very similar to that of X-Ray, the difference is that the dashboard is created by you with what you think is relevant, being possible to add static or dynamic filters.

Admin

The metabase has an admin session, where it’s possible to register users and user groups, connect with new databases, set permission for collections and databases, in addition to a troubleshooting session.

I’ve separated some basic resource tips:

  • It’s possible to save results of queries that take a long time to run, for this you need to enable it in: Admin> Settings> Caching.
  • If you add a table and it does not appear immediately and you know that everything is synchronized, perhaps it is hidden, if it is, you can change the visibility at: Admin> Data Model> (select the database)> (select the table) > VISIBILITY.
  • The metabase has a default for naming to make the table names more friendly, if you want to make your table names the way they are, you can change them at: Admin> General> FRIENDLY TABLE AND FIELD NAMES.

Why we use?

In our case, the main point was to find something that was quick to generate dashboards with metrics on the use of the system with a more managerial view, both for the heads of the areas and for the analysts.

There are other tools for data analysis like python with some libraries (pandas, matplotlib, seaborn, etc.), power bi, tableau, among others, but in this case the metabase was a great choice because:

  • It was possible to unite the two worlds, developers can write queries to generate the visualizations, but users also can do it with two more simplified interfaces for this, this created a very good interaction between the development team and the final users.
  • The speed that we were able to create and share dashboards was also a big point in favor and influenced to have feedbacks more quickly about things that could be improved in the processes.
  • It was possible to segment by access groups, which is also relevant when the system involves several different areas.

In general, it’s a very complete tool, quick and easy to use, which met our needs and we had good results using it, but which also has the potential to be used in various applications.

Here I gave an overview of the main features, but I advise you to explore the other resources that are also very interesting, such as segments and metrics, pulses, metabot and alerts.

If you have any doubt, suggestion or advice feel free to contact us :)

--

--