SQLFluff with dbt

Edward Pasfield
Feefo Product Engineering
5 min readSep 26, 2022
For if you do not believe how popular SQLFluff is becoming :) https://designingoverload.com/sqlfluff-1-0-0/

The speed at which you can develop SQL with data build tool (dbt) is one of its main selling points, along with the wide outreach of people who can use your data warehouse in and across your dev teams. However, both of these bring about problems of their own. One of the main issues is consistency with your SQL and the code base in general. SQLFluff is how we overcame this issue and are now providing as many people as possible the opportunity to add and contribute to our data warehouse in a consistent and manageable way. Another big bonus and reason for using SQLFluff is that it is jinja compatible, making it integrate really well with dbt. Fortunately, it is also really quite easy to implement, don’t believe me… keep reading.

Use Case Background

A few quick bits to give you some context into why we used dbt in the way we do which should help when it comes to understanding some of the implementations. If you are using dbt in a different way you might want to think about a variation on the implementation.

  • We used dbt CLI (The main reason for this is because it is free to use).
  • We also configured dbt for BigQuery as that is what our company uses for data analytics.
  • We use a Docker image for our devs to use a container for ease of set-up, usability and to make our CI/CD and onboarding an easier process.

Implementation

Now onto the more technical bit of this article, it does however highlight the ease of use.

Within our Git repository, there are three locations in which it was necessary to have SQLFluff configurations; these three areas are highlighted below:

Dev Container Config

First things first is to set up the devcontainer.json and pass it the SQLFluff settings so that when working and running within the container linting is already set up for analytics engineers. Examples of more settings you can set in this file are the rules in include/exclude, the dialect (BQ/MySQL/PostgreSQL/… and more)/ when to run the linter and quite importantly the configuration path for within the DBT project.

Example Code:

// SQLFluff"sqlfluff.config": "/workspace/data-transforms/.sqlfluff","sqlfluff.dialect": "bigquery","sqlfluff.excludeRules": ["L009"],"sqlfluff.executablePath": "/usr/local/bin/sqlfluff","sqlfluff.ignoreLocalConfig": false,"sqlfluff.format.enabled": true,"sqlfluff.ignoreParsing": false,"sqlfluff.linter.run": "onType",

dbt Project Configuration

Secondly, the dbt project configuration is in the main dbt project file and this defines: which templater to use; where the templater (dbt) project directory is and other dbt information which might help the linter run e.g. repeat runs, capitalisation policies and the dbt profile and target (both of which are dbt specific parameters). Finally, you can then define certain macros to use which vary from predefined to your own custom macros and can be as simple as turning a single white space into a double.

[sqlfluff]templater = dbtdialect = bigquery# Change if you get 'loop limit reached' errorrunaway_limit = 20extended_capitalisation_policy = pascal[sqlfluff:templater:dbt]project_dir = .profiles_dir = /dbtprofile = feefotarget = dev[sqlfluff:templater:jinja]apply_dbt_builtins = True[sqlfluff:templater:jinja:macros]# Macros provided as builtins for dbt projectsdbt_ref = {% macro ref(model_ref) %}{{model_ref}}{% endmacro %}dbt_source = {% macro source(source_name, table) %}{{source_name}}_{{table}}{% endmacro %}dbt_config = {% macro config() %}{% for k in kwargs %}{% endfor %}{% endmacro %}dbt_var = {% macro var(variable, default='') %}item{% endmacro %}dbt_is_incremental = {% macro is_incremental() %}True{% endmacro %}

A useful nugget of info is that you can define which folders/files to set SQLFluff to ignore in the .sqlfluffignore this is exceptionally useful when having to implement obscure dbt_packages and macros.

Installation Versions

Finally, and only because we are working in a dev container, is the SQLFluff requirements for docker. This simply defines which version of SQLFluff and which version of the templater to use. Both of ours are currently

sqlfluff==0.13.0sqlfluff-templater-dbt==0.13.0

Now that the container settings, the configuration and the versioning are all set up. The last thing to do is attempt to run and use it in practice.

Usage in Practice

How to use SQLFluff is up to you and can be tailored for whatever project is necessary and what makes the most sense for your use case. We have implemented linting in two ways.

  1. onType within our container.
  2. On pre-commit hook.

onType within our Container

However, we have set it up so when you are working inside the docker container the linting happens onType. This means there is no need for running commands such as SQLFluff lint/fix and to use it in practice is as easy as just typing away as usual letting it slowly change things for you automatically. This runs a smaller subset of linting which does things like caps checks and spacing.

On pre commit hook

This runs whenever a push is made to Github and is the whole linting pack we define. This runs the sqlfluff-lint command first (which identifies all of the formatting issues determined by our rule set) followed by the sqlfluff-fix command (which fixes all of the highlighted issues). Because this is done on the pre-commit hook it is only ever run against the files that have been changed within the git commits. Meaning a smaller run time and a faster PR process than if we had to run the linting and fixing on the whole repo every time. For more information on this

Pros

  • Dialect-flexible and configurable SQL linter
  • Works with jinja templating and dbt
  • Auto-fix most linting errors
  • Only runs against changed files in PR process

Cons

  • Cannot make reports based on the linting or the failed fixes
  • Only covers SQL based code

Summary

In summary, the implementation and use of SQLFluff as part of our newly implemented dbt warehouse allows more developers to use and create tables and queries as we know they are adhering to our predefined style conventions; but also have a fast and easy to analyse platform. It also allows us to keep up to date with whatever version we think best and also define and enforce our company’s style conventions across teams which might have a wide variety of SQL knowledge and past best practices! As with most things consistency is key!

If you are interested please give me and the company a follow to keep up to date with our ongoing project!

Edward Pasfield

--

--