2019 LookML Open-Source State of the Union

Carl Anderson
Nov 6 · 13 min read

As Looker’s annual JOIN event was being put together for 2019, something was different from years past. Whereas in 2018, open-source LookML software was near non-existent, in the course of a year, a number of open source projects had sprung up, signalling the growth of a nascent open-source community around Looker and LookML development in particular.

With this growth in open-source projects, and little in the way to organize and discover them, we saw a need to put together a comprehensive survey. We presented this overview at JOIN, and now bring it to you in the first (of hopefully many such) LookML Open-Source State of the Union reports.

We’re going to focus on LookML specifically, rather than all Looker-related projects [1], which means that most of these will be of direct benefit to LookML developers and analysts. After surveying the landscape, we’ve categorized projects into five broad functions, though some projects offer functionality in more than one category:

  • Parsers: Interpret LookML text into a more usable/structured format for other software.
  • Linters: Evaluate LookML against a set of rules, primarily to automate feedback to developers making LookML changes.
  • Visualizers & Informational: Provide alternate presentations of LookML contents for informational or navigational purposes.
  • Generators: Automate the writing of LookML from some other process or format.
  • Testers: Evaluate LookML within the context of a connection, Looker instance, or data, primarily to automate feedback to developers making LookML changes, or, to monitor for breaking external changes.
  • Syntax Highlighters: Add a bit of color to your LookML in various text editors.

Parsers

Since parsers are an important foundational component for working with a language, like LookML, and are used by many projects, we’ll cover them first.

With the introduction of LookML, Looker created a brand new syntax. Thus, at the time, there were no parsers for it. Fast forward a couple of years, and there are now two independent actively maintained open-source implementations of LookML parsers.

Both have similar feature sets, with the main difference being convenience of use from either Node.js or Python. In case you’re not using either of these, both libraries also allow a command-line invocation that exports a JSON representation of the LookML.

node-lookml-parser

First published in December 2017 and open sourced in August of 2018, this was likely the first open-source LookML project, and enables a few other projects, including one linter that we cover below.

In addition to basic LookML parsing, node-lookml-parser also provides the following advanced features:

  • It will assemble the contents of an entire model by following include statements that reference other files or file patterns.
  • It allows embedding of additional data into LookML via specially marked comment tags.

lkml [Parser]

lkml is a fast, pure Python parser with no external dependencies that parses LookML files into a well-structured, nested dictionary. It is also powers multiple other projects, including two linters and a visualizer that we cover below.

In addition to basic LookML parsing, lkml also provides the following advanced features:

  • It boasts a thorough test suite, having been tested on a large corpus of public LookML (160k lines).
  • It also serializes back to LookML — more on this in our Generators section below.

Linters

Linters are tools that check code against a set of stylistic or other programming rules to enforce standards, formatting styles, code smells, and other development best practices. Here, we cover three LookML-centric projects.

lookmlint

lookmlint is a Python-based linter with a command line interface that checks LookML against nine different rules:

  1. Label issues: creates better looking labels when acronyms and abbreviations are within the dimensions name. For instance, you can specify that “usd” should render as “(USD)”.
  2. Raw SQL in joins: checks that joins reference dimensions rather than underlying database fields.
  3. Unused includes: identify views that are not referenced by a model but not by explores in that model.
  4. Unused views: views that are simply not referenced by explores.
  5. Views without a primary key.
  6. Duplicate view labels: find cases when two joins in an exploration end up with the same label.
  7. Missing view SQL definitions: Find any views that do not have a sql_table_name or derived_table value set.
  8. Semicolons in derived table: find any derived table SQL expressions that contain a rogue semicolon, which will throw errors at query time.
  9. Mismatched view names: find any views where the view name does not match the view filename.

The tool uses the lkml parser under the hood to parse the files.

The repo is labelled by the author as “alpha” (v0.2.1 at time of writing). While there isn’t an interface to add new rules, they could be added as new functions and referenced in the CLI file.

Look at Me Sideways

Look at Me Sideways (LAMS) is a LookML style guide and linter authored by two engineers from Looker’s Customer Success team. The style guide comes with over a dozen (sometimes opinionated) rules, ranging from explore-level rules, to field-level rules, to SQL-level rules. The linter checks most (but not all) of them.

In addition to the built-in rules, which can be selectively disabled, LAMS also allows LookML developers to specify custom rules directly in their LookML codebase, using a concise but flexible expression syntax.

LAMS provides feedback to developers via markdown files that can be committed back to the LookML project to be shown to developers directly in Looker’s IDE. The project repository includes a docker configuration for a CI server that can run LAMS automatically on each commit. It also includes a suggested Github configuration to prevent an unapproved commit being merged to master.

As a quick way to get familiarized with the tool, the project provides a video overview:

lookml-tools [Linter]

lookml-tools is a suite of 3 tools: an updater, a grapher, and a linter. Leveraging the lkml parser, the linter component currently implements 10 rules, divided into 3 classes of rules:

  • Field level rules: these are rules that apply to an individual dimension, dimension group, or measure. For instance, a rule might check that a dimension contains a description
  • File Level Rules: these are rules that apply to a LookML files as a whole. For instance, a rule might check that a file specifies a data source.
  • Other rules: these are repository level rules. For instance, a rule might check whether a view is orphaned and not referenced by any other files

Field-level rules and file-level rules each implement an interface (backed by an abstract class) that handles most of the boiler plate code and make it easy to add additional rules. As such, many rules are just a few simple lines of code.

Summary of the lookml-tools linter rules

Visualizers & Informational

A corpus of LookML can quickly grow so that it is hard to see the “forest for the trees.” How do these models, explores, and views relate to each other? Is there dead code that can be removed? In this category, we cover two projects designed to tackle such issues.

lookml-tools [Grapher]

As well as a linter and updater, lookml-tools include a grapher component. Written in Python and leveraging pygraphviz and the lkml parser, the lookml-tools grapher is a tool that parses a corpus of LookML files (typically a whole repo but is configurable) and generates a “network diagram” (directed acyclic graph) showing the relationship among models, explores, and views. For instance, in the diagram below, there are 3 models (in blue), connected to 13 explores (green) and which then feed into 13 views (purple). The tool will also highlight orphans (orange), i.e, explores not referenced by a model, or views not referenced by an explore.

The tool is configured with a JSON file that allows one to modify the size of the nodes, size and orientation of node text, as well as absolute image size to accommodate larger, more complex repos, and hence diagrams.

Example output of the lookml-tools grapher tool that shows the relationship between models (blue), explores (green), and views (purple). Orphans are shown in orange.

The tool also has functionality to create an animated gif to see the state and growth of the repo over time:

An animated GIF, generated by the lookml-tools grapher tool, showing the development of a repo over time.

Henry

LookML developers have to tread a fine line: they need to provide users with a rich set of dimensions and measures that empower their users to ask questions of the data. However, too many dimensions and measures and the UI becomes cluttered, adding friction.

How do you identify the clutter? Henry is a tool to identify unused bloat in LookML. It leverages the Looker API to examine the content in model and explores and compares that against the query logs to identify what was actually run over some time period. Thus, it can identify unused models, explores, and fields.

It has 3 components:

  • Pulse: general health checks on a Looker instance, such as checking connections, version, and legacy features being used.
  • Analyze: for projects, it checks git setup and provides some summary stats. For models, it checks for unused explores (as well as provide other useful stats). For explores, it checks for unused fields, unused joins, presence of description, whether fields are hidden etc.
  • Vacuum: specifying some time period and threshold number of queries, vacuum will identify explores that are under-utilized.

Honorable Mentions

These are two projects that are more early stage but show some promise

  • model_diagram_web_app: this Python script, written by Marcell Babai of Looker, provides a web UI that displays the relationships between tables. Given API credentials, it will list models and explores that you can use as the basis of your visualization, then will display a multi-level tree view (screenshot below) based on the relationships between joins in the selected explore. The code is available on Github, but without any license.
Example output from model_diagram_web_app
  • Looker_Explore_ERD_Generator: this tool, also written in Python, generates a PDF of an entity relationship diagram (ERD) for a given explore. Code is available on Github without a license.
An example entity relationship diagram (ERD) from Looker_Explore_ERD_Generator

Generators

Tools that parse, lint, and analyze all start from LookML. The natural complement is generators, or tools that output LookML. These tools can generate brand new LookML, or can update existing LookML files.

This includes both general-purpose generators, which can generate arbitrary LookML and act as an interface for programmatic use, or use-case-specific generators which help developers automate a specific task, such as maintaining field descriptions from an outside source, or generating view files from specific database schemas.

lkml [Generator]

In addition to its LookML parser, described above, lkml also includes a general-purpose LookML generator. It accepts a Python data structure representing any arbitrary LookML. Since the accepted data structure corresponds to the output of its LookML parser, it works naturally for a parse → mutate → write-back workflow.

Importantly, lkml does not (currently) validate the input dictionary. As such, the best use case is to take the results of lkml.load (i.e., lkml’s function to parse a LookML file) — which will be well-structured — modifying it, and then dumping the result back to a LookML file.

lookml-tools [Updater]

The third component of lookml-tools (in addition to a linter and grapher) is an updater. The updater automates certain types of LookML changes. It does so by operating on LookML text directly, without any intermediate parser-based representation, so it is well suited to changes that seek to control or preserve whitespace and comments.

Like lkml’s generator functionality, lookml-tools is also designed to modify LookML. However, it takes a fundamentally different approach. Rather than parsing the file into an object, modifying that, and then dumping it back out, lookml-tools simplifies modifies the LookML file directly. That is, it searches the content of the LookML files (assuming that it is a valid LookML file) to find an item of interest, such as a dimension with a given name, and then adds or modifies attributes of that item, and writes out the file.

The motivation behind this was to build a tool that was able to add or modify attributes, such as dimension and measure descriptions, but making the minimal changes to the file — no other indentation, whitespace, or other formatting changes. The result would be an automated pull request in which just the modified lines would be changed and thus would be obvious to a human pull request reviewer.

The updater comes with functionality to read from a list of official, sanctioned descriptions for dimensions, dimensions groups, and measures (those could come from a CSV or a database table in a data catalog), check for the presence of an existing description in the LookML, modifying or adding as appropriate, and then creating an automated pull request for the data team to approve. As such, this provides a fully automated sync from a source of truth all the way to a clear, clean pull request. (This post provides additional details and motivation.)

The lookml-tools updater tools take master reference definitions (in this example, from a data catalog), cross references them with descriptions in the LookML, and updates or injects the correct definition into LookML, creating a pull request for review.

lookml-gen

lookml-gen is a general-purpose LookML generator, written in Python. For instance, dimensions are explicitly added into a View object and then generate_lookml is called to generate the LookML.

There is currently no support for explores and, at time of writing, only a subset of the attributes of view fields are supported, but seem easy to add, given interest.

Honorable Mentions

  • JSON to LookML: This Python script, written by Leigha Jarett of Looker, generates and updates LookML based on JSON data within a column in your database. Given Looker API credentials, it will instantiate a helper view, use it to query a sample of data from your database, and then generate a LookML view with dimensions based on the structure of the sampled JSON data. The code is available on Github, but without any license.
  • EAV generator: This browser-based tool, written by Fabio Beltramini of Looker, generates LookML based on Entity-Attribute-Value data within your database. Given Looker API credentials, and inputs via a web UI, it will generate a helper view, use it to query attribute data from your database, and then generate a set of LookML views and joins to create the necessary dimensions, and potentially a set of LookML models for multi-tenant EAV datasets. The tool has a guide posted on Looker’s community forum.
  • Ro LookML Generator — Ro wrote an article about the automation they’ve built around their highly-permissioned EAV-based data model, including fully automated generation of their entire LookML model.
  • FABS — Frame.ai wrote an article about their automated system for generating models tenant-by-tenant for their embedded analytics solution. It keeps their LookML in sync with the data models that their customers define in their application.

Testers

Database schema are never really static. Sources changes. Business rules change. It is all too easy to make a change, such as changing a field or table name, and miss some downstream impact. Now you have a broken SQL query or view. Testers proactively test that LookML is valid and working. As such, these are great additions to a continuous integration (CI) workflow.

Spectacles

As database schema change, they can introduce errors into Looker models via breaking SQL queries. Importantly, however, these are run time errors. You won’t necessarily know about them until an angry users tells you that a view is broken. To proactively check against such changes, Dylan and Josh built spectacles (formerly Fonz). Leveraging the Looker API, spectacles will run queries against each explore to expose any breaks to catch errors before they are deployed.

Syntax Highlighters

To round out our review of tools, we have syntax highlighters. While it can be a seemingly small change, a splash of color can make a huge difference in the development experience. And, while Looker’s built-in editor already has syntax highlighting, some developers will appreciate the opportunity to use their text editor of choice, color and all.

VS Code

Although this syntax highlighter is still marked as pre v1.0, it does come with a few advanced features:

  • Highlighting within SQL blocks
  • Auto-completion of field and view names

Sublime

Syntax highlighting with VSCode (left) and sublime (right).

vim

update 2019–11–17 [thanks Ryan Tuck!]:

Summary

As one can see from the list above, there is now a rich set of open-source tools spanning the whole development chain. They cover generating LookML, validating it, modifying it, testing it, and even visualizing and identifying orphaned or otherwise dead code. All of these are designed to produce a higher quality, more easily understood and maintained code base, and to automate tasks. Together, they enhance the velocity and productivity of a LookML developer team.

We are also pleased to see that not only are there a number of tools and projects, but that projects are leveraging the work of others. For instance, a number of Python projects listed are making use of the lkml parser. That allows developers to focus on new functionality, and the specific problem they are trying to solve. As the LookML open source ecosystem continues to grow and develop, we look forward to seeing the synergies and collaboration that will continue to arise.

The obvious next question, at least to us, is what is missing? What aspects of LookML development, or development best practices not currently being adopted by LookML developers, are not addressed by these tools? Did we miss any tools that you are using or developing? Do you have anything to contribute? If you have some internal tool that you think the community might benefit from, consider open sourcing it. Ask your CTO today. The Looker community will thank you.

— Carl Anderson is Sr. Director, Data Science at WW (the new Weight Watchers)

— Fabio Beltramini is Customer Success at Looker

Footnotes

[1] One source for additional Looker open-source projects, beyond LookML-focused projects that we’ve covered here, is https://looker-open-source.github.io/ . However, these are just Looker-authored projects, and don’t capture the growing community-initiated projects.

    Carl Anderson

    Written by

    Sr. Director of Data Science, WW (formerly Weight Watchers), NYC. Author of "Creating a Data-Driven Organization" (2015, O'Reilly). Web: carlanderson.ai

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade