Ro’s Looker Code Generation System: Basic LookML View Generation

John Mastro
Feb 20 · 5 min read

(article #3 in series)

For background on our Ro Looker Code Generation System, please see our intro article.

TL;DR

Views in Looker are often based closely on underlying database tables, with predictable relationships between database columns and view dimensions. This means we can automatically generate large portions of views.

Background

In our last post about our system for generating LookML, we gave a high-level overview of our motivation and goals. This time, we’ll go into a bit more detail about the basic functionality we use to generate a Looker view corresponding to a database table.

Looker actually has something similar to this built in — Create View From Table — but it’s not exactly what we want. Create View From Table is a one-time thing: it produces a LookML file which you then maintain as LookML going forward. We want a process that we can configure and re-run at any time to pull in new columns without needing to copy over our changes and additions. As we’ll get into a bit in this post, and more in later posts, we also leverage this process to add several other important features. However, today we’ll focus on the basics of generating and customizing a view.

Say you’ve got an orders tables with columns like:

And you want to expose that table as a Looker view. There’s a good chance that you’ll have a dimension for each of the columns, and that the type of each column will map to a particular type for the dimension.

Our goal is to leverage this to make view creation and maintenance faster, more expressive, and less error prone, so that we can react quickly and reliably to changes in the business and the data model.

For example, a basic orders.view.lkml might look like this:

view: orders {

  sql_table_name: my.orders ;;

  dimension: id {
    type: number
    sql: ${TABLE}.id ;;
    primary_key: yes
  }

  dimension: account_id {
    type: number
    sql: ${TABLE}.account_id ;;
  }

  dimension: ordered {
    type: time
    sql: ${TABLE}.ordered ;;
    timeframes: [...]
  }

  dimension: shipped {
    type: time
    sql: ${TABLE}.shipped ;;
    timeframes: [...]
  }

  dimension: days_between_ordered_and_shipped {
    type: number
    sql: datediff(days, ${ordered}, ${created}) ;;
  }

  dimension: amount {
    type: number
    sql: ${TABLE}.amount ;;
    value_format_name: usd
  }

  dimension: amount_off {
    type: number
    sql: ${TABLE}.amount_off ;;
    value_format_name: usd
  }

}

Most of which is a straightforward reflection of the table the view is based on. The dimension days_between_ordered_and_shipped is a notable exception because we’re using an expression that references two other dimensions.

In our tool for generating LookML, we use a query like the following to inspect the relevant table, returning a list of columns and their data types:

select
  column_name,
  data_type
from
  information_schema.columns
where
  table_schema = 'my'
  and table_name = 'orders'
order by
  ordinal_position

Then we use that information to emit appropriate LookML, creating a dimension for every column. Of course, it’s not quite that simple, because information_schema.columns won’t tell you which column is the primary key¹ or that amount and amount_off should be formatted as USD. So, we don’t want to make this totally automated, just mostly.

Configuration

Therefore, we use YAML configuration files to specify the aspects that can’t be determined automatically, or those which we want to override.

For the simple example above, this might look like:

source: table
table_schema: my
table_name: orders

dimensions:
  id:
    primary_key: yes
  amount:
    value_format_name: usd
  days_between_ordered_and_shipped:
    type: number
    sql: datediff(days, ${created}, ${shipped})

Note that we we only explicitly specified type for days_between_ordered_and_shipped, since its sql isn’t a column reference for which we can look up a type. We were able to refer to created and shipped dimensions in the expression, even though they’re not specified in the YAML file, because they’ll be injected automatically.

We’ll also want some measures, so we can add:

# ... same as above

measures:
  total_amount:
    type: sum
    sql: ${amount}
    value_format_name: usd
  total_amount_off:
    # ...
  average_days_between_ordered_and_shipped:
    type: average
    sql: ${days_between_ordered_and_shipped}
  max_days_between_ordered_and_shipped:
    # ...

We’ll also want some measures, so we can add:

# ... same as abovemeasures:
  total_amount:
    type: sum
    sql: ${amount}
    value_format_name: usd
  total_amount_off:
    # ...
  average_days_between_ordered_and_shipped:
    type: average
    sql: ${days_between_ordered_and_shipped}
  max_days_between_ordered_and_shipped:
    # ...

Group labels

We use group labels pretty extensively to give our views more organization and make them more discoverable. To avoid having to specify them per dimension or measure, our YAML configuration format actually adds an extra level of hierarchy to the above examples, so that the group can be specified once and applies to all the dimensions or measures below it.

Therefore, our view file for orders might actually look like:

source: table
table_schema: my
table_name: ordersdimensions:
  IDENTIFIERS:
    id:
      primary_key: yes
    account_id:
  WHEN:
    created:
    ordered:
  FINANCIAL:
    amount:
      value_format_name: usd
    amount_off:
      value_format_name: usdmeasures:
  FINANCIAL:
    total_amount:
      # ...
    average_amount:
      # ...
  WHEN:
    average_days_between_ordered_and_shipped:
      # ...
    max_days_between_ordered_and_shipped:
      # ...

Permissions

As we mentioned in an earlier post and described in more detail on Looker Discourse, we grant permissions in Looker by field. (When we first adopted this system, Looker didn’t have any native support for it, though more recently Looker has begun moving in this direction.)

We do this by employing a very verbose pattern. A very simple dimension like:

dimension: order_id {
  group_label: "IDENTIFIERS"
  primary_key: yes
  type: number
  sql: ${TABLE}.id ;;
}

May instead become:

dimension: order_id_hidden {
  primary_key: yes
  hidden: yes
  group_label: "HIDDEN YOU SHOULD NOT SEE THIS!"
  type: number
  sql: ${TABLE}.id ;;
}dimension: order_id {
  group_label: "IDENTIFIERS"
  type: string
  sql:
    CASE WHEN {{ _user_attributes["permissions_operations_row_level"] }} = 1
    THEN ${order_id_hidden}::varchar
    ELSE
    MD5(${order_id_hidden})
    END ;;
  html:
    {% if _user_attributes["permissions_operations_row_level"] == 1 %}
    {{ value }}
    {% else %}
    [Insufficient Permissions]
    {% endif %} ;;
}

(The order_id_hidden dimension is for use in joins).

We make it very easy to apply permissions like this in our YAML configuration file. For the above, we would only need to write:

dimensions:
  IDENTIFIERS:
    order_id:
      primary_key: yes
      column: id
      perms: [join, ops_row]

If you imagine that repeated across many dimensions in many views that require different types of permissions, it’s clear that making this simpler and more concise can be a big win.

Summary

Our goal is to start from a baseline of “the view’s dimensions reflect the table” and then use configuration only where going beyond that, and we’re pretty happy with where we’ve landed.

When all is said and done, our YAML configuration files contain an order of magnitude fewer lines than the LookML view files we generate from them, which makes views easier to create, understand, and modify.

[1]: And primary keys in Redshift, our data warehouse, are often considered an anti-pattern, so many teams don’t use them at all.

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data science

John Mastro

Written by

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data science