Looker Practices

Göktuğ Cengiz
Trendyol Tech
Published in
5 min readNov 16, 2022

Hello, I am Göktuğ from Trendyol and working as a DWH & BI Analyst. Today, We will talk about practices which we apply on our Looker Project.

But… What is Looker?
Looker is a business intelligence software and big data analytics platform that helps you explore, analyze and share real-time business analytics easily.

The working logic of Looker is briefly as follows.

  • You get the tables in the database you have connected under the name view.
  • If you have needs such as renaming, calculating new fields, you can extend the view in the base. On the other hand, even if we don’t use it, refinement is a good method to copy objects. You can adapt an existing view or explore without editing the LookML file that contains it.
  • In Looker, the data source is named Explore. An explore is created by joining one (or more) view or derived table, is a query whose results are used as if it were an actual table in the database. From Explore, looks (reports are called looks in Looker) and Dashboards by Looks are created.
  • Finally, Everything created in lookml is under model and model is under project. In short, sorting; It goes as view -> explore -> model -> project.

At Looker, our priority is always to optimize the end user experience. The issues we care about for this are; speed, easy and understandable use. Let’s examine step by step.

What can we do to provide a good self-service BI service to the user?

  • By using the Label parameter in dimension, the field name should be presented to the user clearly on the user interface. The use of duplicated field names should be avoided.
  • By using the description parameter in dimension, the information about what the field is should be clearly conveyed to the user. The description can be accessed by the user in both explores' and reports that the field uses.
  • To increase the readability of number values, use the following: value_format for one-off excel-style formatting, value_format_name for pre-configured out of the box formats, and named_value_format to make your excel-style formatting reusable.
dimension: training_view_count {
label: "Training View Count"
description: "The number of times the training was viewed."
type: number
sql: ${TABLE}.training_view_count ;;
value_format_name: decimal_0
}
  • You can define fields with many breakdowns (hour, minute, second, etc.) such as date fields as dimension groups in view, and specify the breakdown types with the timeframes parameter. Additionally, you don’t need to type date at the end of the value in the label parameter. Looker will bring this by default.
dimension_group: training_create {
label: "Training Create"
description: "The date the training was created."
type: time
timeframes: [year, quarter, month, month_num, month_name, week, day_of_week, hour_of_day, day_of_week_index, date, time, raw]
sql: ${TABLE}.training_create_date ;;
}
  • Using the set fields in view, you can hide fields that would be meaningless to the user (e.g primary key) not to display in the UI.
  • Ensure that all Derived Tables have primary keys defined. If you don’t define it, there may be a case of duplicating and all the data in aggregations will give incorrect results. To exemplify, when 2 tables with one to many cardinality are joined, fanout problem may be encountered if primary key is not defined. Let’s consider the invoice type and invoice relationship. The invoice amount can occur in more than one record. The user who wants to see the total invoice amount will get wrong value due to duplicating. The solution to this is to define a primary key in the derived table or to apply symmetric aggregation.
dimension: pk_invoice_id {
primary_key:yes
hidden: yes
type: number
sql: ${invoice_id} ;;
}
  • If you have a standard not to change the field name from the table in the database in the base view (because this table can be used in different places), you can create a dimension with the same name in the extend view and change the name only with the label parameter.
dimension: type_name {
label: "Training Type"
}
  • When importing a table in Looker, all of them are defined as dimensions by default. We can convert them to measure in the extend view. We can rename the conversion process using the label parameter. For the user to easily understand the process here;
    - Total can be added to the name of the field where the sum operation is performed.
    - Number of can be added to the name of the field where the count operation is performed.
    - A percent sign can be added to the end of the field name if it is to be converted to a percentage view.
  • Users use the pivot feature when creating a look in Looker. However, the pivot property is handled only by measure in Looker. If there is a dimension that users will use as a pivot, we need to convert it to measure in the extend view. While converting, the type parameter should contain the value of list, and the list_field parameter should contain the name of the dimension to be converted.
measure: is_returned {
label: "Is Returned"
description: "It is the information of whether there is a return or not."
type: list
list_field: returned
}
  • You may have hundreds of explorers and it may be difficult for users who want to report to find the source they want. Using the group_label parameter in Explore, we can group explorers to where they belong. Use the label parameter for a descriptive name for this explorer, and the description parameter to specify what data it holds. If explore consists of multiple extend views, use the view_label parameter to put the data in them under a heading.
explore: training  {
label: "Training"
description: "It is the data source that contains the training data."
group_label: "Education Platform"
view_label: "Training"
persist_for: "0 seconds"
}

We need to increase performance so that Look and Dashboards can be opened quickly to the user. What are the best practices we can use to improve performance?

  • By using suggestions parameter in extend view you can hard-code a list of possible values instead and improve performance.
dimension: colors {
type: string
sql: ${TABLE}.colors ;;
suggestions: ["red", "yellow", "blue"]
}

By using suggest_dimensionin extend view, you can make Looker query an alternative dimension for the suggestion values. If that dimension is defined in a different Explore, suggest_explore tells Looker where to find it. In this case, include both the view name where the dimension is defined and the dimension name in the format view_name.field_name. In addition, ensure that the view is joined to the Explore that is specified in the suggest_explore parameter. If the dimension is defined in the current view, suggest_explore is not required.

dimension: event_user_name {
type: string
sql: ${TABLE}.event_user_name ;;
suggest_explore: user
suggest_dimension: user.name
}
  • Use datagroupsto align generation of Explore caching with underlying ETL processes. Datagroups can also be used to trigger deliveries of dashboards, legacy dashboards, or Looks to ensure that up-to-date data is sent to recipients.
datagroup: datagroup_name {
max_cache_age: "24 hours"
sql_trigger: SELECT max(id) FROM my_tablename ;;
interval_trigger: "12 hours"
label: "desired label"
description: "description string"
}
  • Define the relationship parameter for all joins in Explore. This will ensure that metrics aggregate properly within Looker. Joining many to one from the most granular level typically provides the best query performance.

In conclusion, I would like to mention about looker practices in this article. It is very large topic. Moreover, it is not possible to complete all things with just one article. I hope that you can discover new efficient methods with this article.

Thanks for reading!

--

--