In a large organization with many segmented departments, data governance becomes a difficult task to handle. Looker is a great tool to help with data governance because it offers a rigid data model, but many teams need some flexibility with their data model and require the ability to redefine or add to certain parts of the data model. Enter Hub and Spoke.
Hub and Spoke architectures have been prevalent in many industries for quite some time, and the phrase hub and spoke can be interpreted differently depending on the type of technology used and the type of business using it. However, the basic principle remains the same. The hub controls a central piece of information, which is propagated out to the different spokes. The spokes can use that information and augment it any way they want for their own purpose, but cannot push changes to that information upstream back to the hub, nor laterally to other spokes.
In Looker, we can use a feature called project_import to implement a hub and spoke architecture between different Looker projects. This enables one team to define a central, immutable repository of code that gets propagated to the other development teams. Those teams in turn can extend the files created in the hub project, changing or adding to them for their own use.
Setting up the Hub and Spoke architecture is not straightforward. My goal is to help demystify the hub and spoke strategy and help you determine if this is the right approach for your instance.
Step 1: Create your hub project
This project should impose the most stringent level of access control that Looker offers, as we only want a small group of developers to be changing this code. Therefore it may come as a bit of a surprise that all Looker developers will need to have access to this project. This is because they will need to pull changes down from the hub when they are in development mode. Since development mode is an instance-wide toggle in Looker, they won’t be able to be in dev mode in one project but not in another.
So, how do we lock down the hub project? We’ll need to enforce the restriction in github (or other git provider). While all Looker developers may have dev access to the hub project, only a select few who have access to the git repo will be able to open a pull request in github and actually merge their code with production.
In Looker, assuming you have already set up your git integration, all you need to do is navigate to your project settings in the hub and enable the “Require merge requests” option. With this setting enabled, all Looker developers will need to navigate to github in order to merge their changes with the production branch. Only users with credentials to the repo in which the hub project lives will be able to do so.
Step 2: Build your hub project
The hub project is organized differently than normal Looker projects. Since Looker does not allow model files to be extended into other projects, you won’t need a model file with a ton of explores defined. Instead, you’ll build your explores in another file (or files), which will be included in the model files within each spoke project.
But before we touch on explores, let’s get to the most important part of your hub project — view files. These are your files that the spoke projects will inherit, which contain governed and standardized logic. Some tips here:
- Use the IDE Folders feature in Looker, especially if you have multiple connections throughout your spoke projects. Name each folder according to the database dialect used within each view file.
- Stick to a common format throughout each view file. Be aware that many developers will be referencing your views and likely using them as templates — use good coding practices as a baseline for everyone else.
- Be as descriptive as possible in your code. Add descriptions to measures, and make sure that your naming conventions make sense to a broad audience.
- Set up a formal review process to ensure that all dimensions and measures use logic that should be inherited everywhere. A measure that only one department uses should be defined in that team’s spoke project, not in the hub.
If you want to surface a series of explores to your spoke projects, you’ll need to create one or more explore files. To create an explore file you can click the + button in your LookML IDE and select “Create generic LookML file”. Or, if you’re on an earlier version of Looker, you can simply create a blank view file and name it with the appropriate suffixes (file_name.explore.lkml). The suffixes will cause Looker to categorize this as a different type of file. Here is the official Looker documentation on explore files
Some tips about the explore file(s):
- I would recommend calling it hub_explores.explore.lkml. Or, if there will be multiple connections defined in the across the spoke projects, you may need multiple explore files to account for different SQL dialects and database objects. I would recommend prefixing each file with the name of the connection or database dialect (e.g. bigquery_hub_explores.explore.lkml).
- Be sparing in the number of explores you create. There is not much need for creating single-view explores. Only define explores if there is logic in them that should be inherited everywhere. Otherwise it is better to let the spoke projects create the explores they need, with their own descriptions, labels, and group labels.
- Keep in mind that when spoke projects include an explore in their model file, it will show up in the explore dropdown once for each project that includes it for admins and developers with access to all projects.
- If two spokes both include the same explore in their models, the same explore will appear twice in the dropdown. This is why it is best to allow the spokes to create their own explores, allowing for differentiation.
Step 3: Setup the spokes
Each spoke project will need to inherit the governed code from the hub. To do this, we’ll take advantage of Looker’s project import feature. This feature is essential to the hub and spoke architecture. To do this, you’ll need to create a project manifest file in each spoke project. This file only requires a few lines of crucial code. You’ll need to specify the project name in wish the file lives (the name of the spoke project), as well as the name of the project serving as the hub.
Here’s an example of a manifest file:
Each spoke will need a model file. The model file in each spoke should include the explore file(s) relevant to their business area, as well as view files from the hub that they need to build those and other explores. The syntax for including files from the hub is documented in the same link above.
Once those files are included, the developers in each spoke will be able to extend code from the hub and augment/redefine it as they see fit.
Step 4: Enable the spokes
Training is a big component of ensuring the hub and spoke method will be able to scale as you roll out the platform to more teams. LookML developers will need to understand how to extend the code from the hub into their own projects and use it without cluttering up the Looker instance. Here are some tips for keeping the architecture functioning with few hiccups:
- Document everything, and record a short video with simple steps for how to extend a file.
- Remember to include all necessary files in whatever file you’re working on (a view from the hub needs to be included in the view file that is extending that view)
- Emphasize that spoke developers will need to regularly pull code down from the hub in order to stay up to date. They might not realize that since their project depends on the hub, they need to pull down the most recent version of the hub onto their dev branch while in development mode.
Step 5: Untangle your access controls
While it may seem complex, managing access to content and data is not terribly difficult in a hub and spoke architecture once you establish the building blocks. By using groups and roles, you can manage granular permissions and folder access with relatively low maintenance. Here’s how:
- Create the folder structure you need to facilitate your spokes. I would recommend one main subfolder for each spoke, with more subfolders within each spoke if necessary.
- Create groups that have access to each folder. One main group per spoke should do the trick. Assign each group to its corresponding folder within the shared folders.
- For each spoke, create roles that partition developers from dashboard builders from viewers. This will also require you to create a model_set for each spoke, and a permission_set for each type of user. Remember to always include the hub model in your model sets, as all developers will need access to the hub project. You can assign the same model set to your other roles within each spoke.
Here are some examples of what your permission and model sets might look like:
permission_set: developer -Contains developer permissions
permission_set: business_user -Contains business user permissions (can explore, create dashboards, etc.)
permission_set: viewer -Can only view dashboards, cannot explore
model_set: finance -Contains the hub model, along with any models in the finance spoke project
model_set: marketing -Contains the hub model, along with any models in the marketing spoke project
Here’s an example of what the subsequent roles might look like:
Role = finance_developer
- model_set = finance_model
- permission_set = developer
Role = marketing_developer
- model_set = marketing_model
- permission_set = developer
Role = finance_viewer
- model_set = finance_model
- permission_set = viewer
Role = marketing_business_user
- model_set = marketing_model
- permission_set = business_user
(Optional) Assign your roles to subgroups within each spoke. This allows a user that is added to a group to inherit a role associated with that group. Doing so will reduce the number of actions needed when a new user is created. If you create a group called “finance_developer” you can attach the finance developer role to that group.
You can add those subgroups to your main group for each spoke. For example, any user added to the finance_developer group will automatically gain access to the Finance folder, along with the model_set and permission_set defined in the role that is attached to that group. You’ve made it so that every new user only requires one group assignment to determine what they can access.
An additional note about access controls regarding multi-spoke developers -since development mode is instance-wide, a developer will not be able to be in dev mode in one project, and production in another. Therefore, developers will need dev access to spokes they have view access to, otherwise they will not be able to view data from those spokes while in dev mode (toggling out of dev mode fixes the issue for them, but is not a great experience). If you need spoke developers to be able to view content across many projects, you can implement git-level development access for each spoke, and give all developers the Looker role that allows them to develop in each spoke.
Step 6: Leverage the API
When managing a large scale Looker implementation, it becomes essential to familiarize yourself with the API and feel comfortable writing and executing simple scripts that make requests to the API.
Some examples of actions you’ll want to take regularly are:
- Cleanup: Move old dashboards to either the trash or a designated folder for old content. You can create a report based off of the system_activity model that returns the dashboard_id for any dashboard that hasn’t been viewed in 6 months
- Add users in bulk: You may need to onboard a new spoke all at once, and if there are many members of that spoke, it will become much easier to leverage a script that adds those users for you than to manually add them via the UI.
- Disable users: As you continue to roll out the Looker platform, you’ll inevitably have users drop off (either by leaving the company or changing roles). It will pay off to have a script built that can disable users for you, either from a simple file of user_ids generated from the system_activity model, or by your own logic defined in your script.
- You may have groups of users that need access to multiple spokes. You can easily create a role that includes multiple model_sets and assign it to a new group. Or, you can create a model_set that include some or all models and assign it to a role/group.
- Looker offers SAML integration. I would recommend exploring authentication options to make granting initial access to users easier.
- Establish a process by which spoke developers can submit requests for getting a code change into the hub. It may be the case that a developer in a spoke project comes up with a better definition of a metric, or a more efficient way to write a derived table, and they should be able to voice their findings.
- You can create a generic project that all users have access to that will allow you to use the code from the hub and surface it via a set of explores. This project will function in the same way as a spoke — changes made to it will not be reflected in the hub — but it will allow the hub explores to surface without any spoke modifications.
- Feel free to enable the same git access control as the hub to prevent any other developers making changes to the code in that project.
Hopefully this helps you make the decision about whether or not to go down the road of implementing a hub and spoke architecture. I have personally found it very scalable and flexible, but it requires a diligent setup and constant maintenance going forward.