Ro’s Looker Code Generation System: Basic LookML View Generation

(article #3 in series)

For background on our Ro Looker Code Generation System, please see our intro article.

TL;DR

Views in Looker are often based closely on underlying database tables, with
predictable relationships between database columns and view dimensions. This means we can automatically generate large portions of views.

Background

In our last post about our system for generating LookML, we gave a
high-level overview of our motivation and goals. This time, we’ll go into a bit
more detail about the basic functionality we use to generate a Looker view
corresponding to a database table.

Looker actually has something similar to this built in — Create View From
Table — but it’s not exactly what we want. Create View From Table is a one-time thing: it produces a LookML file which you then maintain as LookML going forward. We want a process that we can configure and re-run at any time to pull in new columns without needing to copy over our changes and additions. As we’ll get into a bit in this post, and more in later posts, we also leverage this process to add several other important features. However, today we’ll focus on the basics of generating and customizing a view.

Say you’ve got an orders tables with columns like:

And you want to expose that table as a Looker view. There’s a good chance that you’ll have a dimension for each of the columns, and that the type of each
column will map to a particular type for the dimension.

Our goal is to leverage this to make view creation and maintenance faster, more expressive, and less error prone, so that we can react quickly and reliably to changes in the business and the data model.

For example, a basic orders.view.lkml might look like this:

view: orders {

sql_table_name: my.orders ;;

dimension: id {
type: number
sql: ${TABLE}.id ;;
primary_key: yes
}

dimension: account_id {
type: number
sql: ${TABLE}.account_id ;;
}

dimension: ordered {
type: time
sql: ${TABLE}.ordered ;;
timeframes: [...]
}

dimension: shipped {
type: time
sql: ${TABLE}.shipped ;;
timeframes: [...]
}

dimension: days_between_ordered_and_shipped {
type: number
sql: datediff(days, ${ordered}, ${created}) ;;
}

dimension: amount {
type: number
sql: ${TABLE}.amount ;;
value_format_name: usd
}

dimension: amount_off {
type: number
sql: ${TABLE}.amount_off ;;
value_format_name: usd
}

}

Most of which is a straightforward reflection of the table the view is based on.
The dimension days_between_ordered_and_shipped is a notable exception because we’re using an expression that references two other dimensions.

In our tool for generating LookML, we use a query like the following to
inspect the relevant table, returning a list of columns and their data types:

select
column_name,
data_type
from
information_schema.columns
where
table_schema = 'my'
and table_name = 'orders'
order by
ordinal_position

Then we use that information to emit appropriate LookML, creating a dimension for every column. Of course, it’s not quite that simple, because
information_schema.columns won’t tell you which column is the primary key¹ or that amount and amount_off should be formatted as USD. So, we don’t want to make this totally automated, just mostly.

Configuration

Therefore, we use YAML configuration files to specify the aspects that can’t
be determined automatically, or those which we want to override.

For the simple example above, this might look like:

source: table
table_schema: my
table_name: orders

dimensions:
id:
primary_key: yes
amount:
value_format_name: usd
days_between_ordered_and_shipped:
type: number
sql: datediff(days, ${created}, ${shipped})

Note that we we only explicitly specified type for
days_between_ordered_and_shipped, since its sql isn’t a column reference for
which we can look up a type. We were able to refer to created and shipped
dimensions in the expression, even though they’re not specified in the YAML
file, because they’ll be injected automatically.

We’ll also want some measures, so we can add:

# ... same as above

measures:
total_amount:
type: sum
sql: ${amount}
value_format_name: usd
total_amount_off:
# ...
average_days_between_ordered_and_shipped:
type: average
sql: ${days_between_ordered_and_shipped}
max_days_between_ordered_and_shipped:
# ...

We’ll also want some measures, so we can add:

# ... same as above
measures:
total_amount:
type: sum
sql: ${amount}
value_format_name: usd
total_amount_off:
# ...
average_days_between_ordered_and_shipped:
type: average
sql: ${days_between_ordered_and_shipped}
max_days_between_ordered_and_shipped:
# ...

Group labels

We use group labels pretty extensively to give our views more organization and make them more discoverable. To avoid having to specify them per dimension or measure, our YAML configuration format actually adds an extra level of hierarchy to the above examples, so that the group can be specified once and applies to all the dimensions or measures below it.

Therefore, our view file for orders might actually look like:

source: table
table_schema: my
table_name: orders
dimensions:
IDENTIFIERS:
id:
primary_key: yes
account_id:
WHEN:
created:
ordered:
FINANCIAL:
amount:
value_format_name: usd
amount_off:
value_format_name: usd
measures:
FINANCIAL:
total_amount:
# ...
average_amount:
# ...
WHEN:
average_days_between_ordered_and_shipped:
# ...
max_days_between_ordered_and_shipped:
# ...

Permissions

As we mentioned in an earlier post and described in more detail on
Looker Discourse, we grant permissions in Looker by field. (When we first
adopted this system, Looker didn’t have any native support for it, though more recently Looker has begun moving in this direction.)

We do this by employing a very verbose pattern. A very simple dimension like:

dimension: order_id {
group_label: "IDENTIFIERS"
primary_key: yes
type: number
sql: ${TABLE}.id ;;
}

May instead become:

dimension: order_id_hidden {
primary_key: yes
hidden: yes
group_label: "HIDDEN YOU SHOULD NOT SEE THIS!"
type: number
sql: ${TABLE}.id ;;
}
dimension: order_id {
group_label: "IDENTIFIERS"
type: string
sql:
CASE WHEN {{ _user_attributes["permissions_operations_row_level"] }} = 1
THEN ${order_id_hidden}::varchar
ELSE
MD5(${order_id_hidden})
END ;;
html:
{% if _user_attributes["permissions_operations_row_level"] == 1 %}
{{ value }}
{% else %}
[Insufficient Permissions]
{% endif %} ;;
}

(The order_id_hidden dimension is for use in joins).

We make it very easy to apply permissions like this in our YAML configuration
file. For the above, we would only need to write:

dimensions:
IDENTIFIERS:
order_id:
primary_key: yes
column: id
perms: [join, ops_row]

If you imagine that repeated across many dimensions in many views that require different types of permissions, it’s clear that making this simpler and more concise can be a big win.

Summary

Our goal is to start from a baseline of “the view’s dimensions reflect the
table” and then use configuration only where going beyond that, and we’re pretty happy with where we’ve landed.

When all is said and done, our YAML configuration files contain an order of
magnitude fewer lines than the LookML view files we generate from them, which makes views easier to create, understand, and modify.

[1]: And primary keys in Redshift, our data warehouse, are often considered an anti-pattern, so many teams don’t use them at all.