Releasing lookml-tools: better Looker code, user experience, and data governance
At WW, we leverage Looker as our primary business intelligence tool. This helps us understand our business and enhance our member experience. A core feature of Looker is LookML. This is a flexible markup language that defines the sources of data and how it is transformed and joined while also defining the dimensions (the data filters, such as country or time) and the measures (the aggregates, such as totals and averages) that the user is exposed to.
In this post, we are pleased to announce lookml-tools, a new toolkit that our customer intelligence engineering team is open-sourcing to help the Looker community — especially LookML developers — write cleaner, more consistent code, deliver a better end user experience, and enhance data quality and governance.
The library contains three tools:
- Linter: a tool that checks LookML against a suite of coding and styling rules.
- Grapher: a tool that generates a “network diagram” of the model-explore-view relationships and highlights any orphaned files.
- Updater: a tool that takes a master source of metrics definitions, compares that with what is present (or not) in LookML, and, where necessary, injects or updates the correct definitions into the LookML. Finally, it creates a Github pull request. As such, the definitions flow from master to end user in a controlled and consistent manner without having to manually sync descriptions among multiple systems.
Let’s dig into each of these components.
To provide a consistent Looker user experience, we developed a LookML style guide. This provides rules for user-facing aspects such as naming conventions and how to ensure that users can navigate and explore the data effectively, as well as best practices under the hood. This is good for both end users as well as the engineering and analytics staff who develop the LookML models, views, and dashboards.
How do we evaluate LookML and flag any infractions? We developed a sweet and simple Python linter that runs these checks periodically and writes its finding — a report of which LookML dimensions, measures, or files passed or failed the rules — back into Looker itself, where they can be measured, dashboarded, and alerted. While we implemented 11 rules into the style guide (the declutter rule is difficult to define and alert against), we also made sure that the framework is very easy to extend if we want to create new rules. In fact, most rules are just 3–5 lines of code that evaluate whether a chunk of LookML is relevant for this rule and whether it passed. As such, we hope that the community can contribute to a larger suite of rules that we can all use.
As developers write LookML code, they create different, interrelated files: models that define data sources, views that expose the dimensions and measures for some data source, and explores that group views into a logical unit, often around a data source. For instance, perhaps the model defines a Google Analytics data source, and one view focuses on session data while another covers referrals, and the explore file groups sessions and referral views into a single GA-focussed unit.
When you have multiple developers building out the LookML, especially if they are new to a data source or to LookML development, you might test out ideas and start to lose the forest for the trees: it becomes hard to keep track of the bigger picture and relationships among these files. This is where the grapher comes in. This tool parses the LookML files and produces a network diagram to show the relationships.
In the example below, we can see the relationships between the models (blue), the explores (green), and the views (purple) as well as code reuse. Moreover, you can spot a single orange dot at the top. This is an orphan file, a view that is not referenced by any explore and represents dead code that can be removed from the repository.
Data is higher quality if it is defined. When Looker users mouse over a dimension or measure, a description of that term will pop up and help them understand what the metric means but only if a description is included in the LookML. We can check for that with our linter. However, where should those definitions originate?
While developers could add those into the LookML directly and manually themselves, they will inevitably get out of sync with other systems. Data is higher quality if it is consistently defined across systems, and many companies have multiple business intelligence tools. Thus, we wanted to create a system that took definitions from a master source and injected or updated them in the LookML automatically, creating a pull request for the LookML repository admins to review. This is the updater component of lookml-tools. As such, it solves both the syncing and consistency problem, enhancing data governance.
In this case, we use a data catalog as our master source, and the updater queries from that master database to obtain the list. We maintain a mapping table from those master definitions to individual dimensions and measures, and then the updater parses the LookML repo. You don’t have to use a data catalog, you can just maintain a list of master definitions as a CSV file and lookml-tools can use that. The key is that you are freeing developers of a tedious but important task and delegating it to a machine: the code runs periodically in Docker.
We are excited to release lookml-tools:
- Source: https://github.com/ww-tech/lookml-tools
- Docs: https://ww-tech.github.io/lookml-tools/
- Pypi: https://pypi.org/project/lookml-tools/
We hope that others find this useful, and we would love feedback, suggestions, and hopefully contributions. Create a pull request!