Incremental LookML Validation in CI

Alex
Dandy Engineering, Product & Data Blog
5 min readNov 21, 2022

Validating LookML is a critical part of keeping a Looker project healthy when developing. In the web UI, this is fast and easy; even in a large project it only takes a few minutes to check for issues.

LookML validation in the web UI

However, validating the same code via the Python SDK can take over an hour to complete, if it completes at all.

>>> sdk.validate_project("production")

SDKError: HTTPSConnectionPool(host='cloud.looker.com', port=443):
Read timed out. (read timeout=3600)

The difference is due to how the validation works. When we run it from the web UI, only files that have been modified will be validated. When using the Python SDK, the entire project will be validated — even if only one character in one view has changed.

Validating code changes in CI before merging is standard best practice, but with 60+ minute runtimes this is not feasible. The guidelines to improve performance (reducing scope of includes, splitting project into multiple projects) increase the complexity of the project and make it harder for users to navigate. Ideally we would be able to use the same incremental validation available from the web UI in our own automated checks.

Reverse Engineering Looker’s Web UI

Since the web UI allows for incremental validation, we can reverse engineer the process to try recreate it in other workflows. Using Chrome’s network tools we can see a request to the /lint endpoint that runs when we click Validate LookML:

The response is a JSON object listing all the errors, their severity, and useful context:

We can also see the headers that were sent with the request. Running the same request with Python we are able to get the same result, and with some trial and error we can pare it down to only the required headers:

>>> url = f"{base_url}/api/internal/projects/{project_name}/lint"
>>> headers = {"x-csrf-token": "I1rOtBVcMzq3hZT3nrUK..."}
>>> cookies = {"rack.session": "DEkiDl9fRkxBU0hfXwY6BkVGewBJI..."}
>>>
>>> requests.get(url, headers=headers, cookies=cookies)

{"errors":[{"code":"","severity":"warning","kind":"syntax","message":"Measures..."

Now we know how to hit the API, we just need to figure out how to get these tokens. Using Chrome’s network tools again, we can see these are set as cookies when we log in:

Automating Incremental Validation

To get the credentials programmatically, we can use an automated browser to open the Looker login page, login, and return the token info:

def get_tokens():
driver = webdriver.Chrome(ChromeDriverManager().install())

base_url = os.environ.get('LOOKERSDK_BASE_URL')
login_email = os.environ.get('LOOKERSDK_LOGIN_EMAIL')
login_password = os.environ.get('LOOKERSDK_LOGIN_PASSWORD')

driver.get(base_url)

# log into Looker
email_form = driver.find_element(By.ID, "login-email")
email_form.send_keys(login_email)

passw_form = driver.find_element(By.ID, "login-password")
passw_form.send_keys(login_password)

driver.find_element(By.ID, "login-submit").click()

# create a dict with all current cookies
all_cookies=driver.get_cookies()
cookies_dict = {}
for cookie in all_cookies:
cookies_dict[cookie['name']] = cookie['value']

csrf_token = cookies_dict.get('CSRF-TOKEN')
rack_session = cookies_dict.get('rack.session')

# decode the returned token which is returned URL-encoded
csrf_token = unquote(csrf_token)

driver.quit()

return csrf_token, rack_session

Then, we can use those tokens to fill in the headers for a request we send using Python:

def lint_lookml(project_name, csrf_token, rack_session):

base_url = os.environ.get('LOOKERSDK_BASE_URL')

url = f"{base_url}/api/internal/projects/{project_name}/lint"
headers = {'x-csrf-token': csrf_token}
cookies = {'rack.session': rack_session}

res = requests.get(url, headers=headers, cookies=cookies)

print(f"Request response: {res.status_code} - {res.reason}")

res_dict = res.json()
errors = res_dict['errors']

print(f"Error count = {len(errors)}\n")

for sev in ["fatal", "warning", "info"]:
subset = [e for e in errors if e["severity"] == sev]
print(f"\n{len(subset)} {sev.upper()} issues found:")

for e in subset:
print(f" [model {e['model_id']}]\t File {e['file_path']}\t Line {e['line_number']}\t {e['message']}")

fatal_error_count = len([e for e in errors if e["severity"] == "fatal"])

return fatal_error_count

The only piece left to automate is entering dev mode and switching to the target branch. The Python SDK provides methods for this already, so we can use those:

def prepare_env(project_name, dev_branch_name):

sdk = looker_sdk.init40()

# enter development mode
res = sdk.update_session(looker_sdk.models.WriteApiSession(workspace_id="dev"))

# switch to the git branch we want to work with
res = sdk.update_git_branch(project_name, looker_sdk.models.WriteGitBranch(name=dev_branch_name))

# pull in any changes from remote
res = sdk.reset_project_to_remote(project_name)

Putting it all together, we add some argument handling and check for the Looker credentials we need before getting the tokens and hitting the API:

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--project", type=str, required=True, help="String. The project to work in.")
parser.add_argument("--dev-branch", type=str, required=True, help="String. The git branch to validate.")

args = parser.parse_args()

# set/check envvars used by the Looker SDK
os.environ.setdefault("LOOKERSDK_BASE_URL", "https://your_subdomain.cloud.looker.com")

expected_envvars = [
"LOOKERSDK_BASE_URL"
, "LOOKERSDK_CLIENT_ID"
, "LOOKERSDK_CLIENT_SECRET"
, "LOOKERSDK_LOGIN_EMAIL"
, "LOOKERSDK_LOGIN_PASSWORD"
]

env = os.environ

for envvar in expected_envvars:
if envvar not in env:
print(f"Expected environment variable {envvar} is not set!")
raise RuntimeError

print("\n🏗 Setting up by entering dev mode and changing branch:")
prepare_env(args.project, args.dev_branch)

print("\n🔑 Authorizing Looker via browser:")
csrf_token, rack_session = get_tokens()

print("\n🕵️ Linting LookML:")
fatal_error_count = lint_lookml(args.project, csrf_token, rack_session)

if fatal_error_count == 0:
print(f"✅ No 'fatal' errors found")
else:
print(f"⛔️ {fatal_error_count} 'fatal' errors found. These should be fixed before deploying. Raising an error so CI fails...")
raise SyntaxError

We incorporate this into our CI workflow and require LookML validation to pass before merging. This ensures no breaking changes can accidentally make their way into production and take down our instance or break content.

Consider setting up a separate Looker user for CI so that it runs isolated from your own dev workspace. This also lets you restrict the permissions available in CI but will cost a dev license.

Overall, this workflow runs about 90% faster than using the SDK approach and has saved us from pushing errors to production multiple times. Most often we catch Inaccessible View errors where a view uses dimensions from another view but the explore does not join to that view. Another common problem is Unknown or Inaccessible Field errors which are usually caused by changes to the underlying dbt model.

--

--