Automating SQL code formatting with SQLFluff.

How we have configured and applied the SQLFluff tool to simplify the process of creating models in DBT.

Alice Thomaz
8 min readJul 23, 2023

During the creation of our project in DBT, we established several guidelines for SQL code development, including the standard formatting to be used. Initially, we faced some challenges in this process due to the absence of a specific standard in the market and resistance towards the time spent on formatting, as each person followed their own style based on personal preferences.

Our intention in defining a formatting standard for the new project was to maximize the readability and maintainability of our code, an area where our previous project fell short. During the mass migration of analytical SQL models, we realized that the lack of this standard made the process more cumbersome. However, we also didn’t want to compromise project deadlines because of it, so we decided to conduct a discovery period to identify the most suitable tool for our needs.

That’s how we found SQLFluff, a customizable SQL linter with a flexible dialect, which allowed us to automate day-to-day tasks and adapt it extensively to our reality. The configurations below were established in collaboration between the engineering and data analysis teams and can be adjusted according to your company’s reality. Although not all desired points were covered, we managed to address the majority, which already made this process much easier.

>> Configuration File

You will need to create a file called .sqlfluff in your project to define the main settings of the tool. We chose to customize it as much as possible according to our reality, but the tool itself already has a default standard that can be used. Below is the final version of our file as an example, and then I will explain each section of the file.

[sqlfluff]
templater = jinja
dialect = redshift
exclude_rules = AL04, AL05, AL06, AL07, AM01, AM02, AM03, AM04, AM07, CV01,
CV02, CV03, CV04, CV05, CV06, CV07, CV08, CV09, CV10, CV11,
LT03, LT05, LT07, LT08, LT12, LT13, RF01, RF02, RF03, RF04,
RF05, RF06, ST01, ST02, ST03, ST06, ST07, ST08, TQ01
ignore = templating
large_file_skip_byte_limit = 0
max_line_length = 0


[sqlfluff:layout:type:alias_expression]
spacing_before = align
align_within = select_clause
spacing_after = touch

[sqlfluff:indentation]
tab_space_size = 2
indent_unit = space
indented_joins = false
indented_using_on = true
allow_implicit_indents = true
indented_on_contents = false
indented_ctes = false

[sqlfluff:rules:aliasing.table]
aliasing.table = explicit

[sqlfluff:rules:aliasing.column]
aliasing.column = explicit

[sqlfluff:rules:aliasing.expression]
allow_scalar = True

[sqlfluff:rules:ambiguous.join]
fully_qualify_join_types = inner

[sqlfluff:rules:ambiguous.column_references]
group_by_and_order_by_style = consistent

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = upper

[sqlfluff:rules:capitalisation.identifiers]
extended_capitalisation_policy = lower
unquoted_identifiers_policy = all

[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = upper

[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = upper

[sqlfluff:rules:capitalisation.types]
extended_capitalisation_policy = upper

[sqlfluff:rules:jinja.padding]
single_space = true

[sqlfluff:rules:layout.spacing]
no_trailing_whitespace = true
extra_whitespace = false

[sqlfluff:rules:layout.commas]
line_position = trailing

[sqlfluff:rules:layout.functions]
no_space_after_function_name = true

[sqlfluff:rules:layout.select_targets]
wildcard_policy = single

[sqlfluff:rules:layout.set_operators]
set_operator_on_new_line = ['UNION', 'UNION ALL']

[sqlfluff:rules:structure.nested_case]

[sqlfluff:rules:structure.subquery]
forbid_subquery_in = join

Initial Configuration — [sqlfluff]

At the beginning of the project, we will establish the initial settings for the file:

  • templater: The tool currently offers some templates, such as raw, jinja, python, and placeholder. Although it is possible to use the dbt template with a different setup, Fluff recommends using jinja in this case.
  • dialect: It will be defined according to the database used. To check which dialects are currently supported, refer to: Dialects Reference.
  • exclude_rules: As some rules didn’t fit our requirements, we decided to disable them. In the next topic, we will explain the rules in more detail.
  • ignore: These rules are grouped into categories. In this case, we chose to ignore the templating because it doesn’t work well with the initial jinja configurations we defined in our DBT models.
  • large_file_skip_byte_limit: By default, Fluff only checks codes with up to 20,000 bytes. By setting the value to 0, we can disable this feature.
  • max_line_length: Following the same line, Fluff sets a maximum line length limit of 80 by default. We set the value to 0 to disable it.

Rules

The rules separate the formatting standards by theme. The tool already has some default configurations, but it is possible to customize them separately. Below are the standards we use, and for more details about each one, just refer to the official documentation: Rules Reference.

  • [sqlfluff:layout:type:alias_expression]: This rule is responsible for aligning all ‘AS’ for renamed columns in the SELECT statement.
  • [sqlfluff:indentation]: Here, you define the spacing used for indentation. By default, we work with a 2-space distance.
  • [sqlfluff:rules:aliasing.table]: We defined that when a table is renamed in the FROM clause, it needs to have an explicit ‘AS’. However, there’s also the ‘implicit’ option.
  • [sqlfluff:rules:aliasing.column]: We defined that when a column is renamed in the SELECT clause, it needs to have an explicit ‘AS’. However, there’s also the ‘implicit’ option.
  • [sqlfluff:rules:aliasing.expression]: We defined that column changes must be renamed, for example: SUM(events) AS total_events. However, this clause can be set to False.
  • [sqlfluff:rules:ambiguous.join]: When the JOIN is used without identifying whether it is LEFT, RIGHT, etc., it will be considered INNER. However, there are also the ‘outer’ and ‘both’ options.
  • [sqlfluff:rules:ambiguous.column_references]: When configuring GROUP BY and ORDER BY, the columns must be constants, meaning either all numbers or all names. It’s also possible to define the option for a single type: ‘implicit’ — all with numbers — or ‘explicit’ — always as names.
  • [sqlfluff:rules:capitalisation.keywords]: All keywords in uppercase, for example: SELECT, FROM, UNION. It’s also possible to define other options such as ‘consistent’, ‘lower’, ‘capitalise’, and choose to ignore some words with ‘ignore_words’.
  • [sqlfluff:rules:capitalisation.identifiers]: All column names in lowercase. You can also use ‘consistent’, ‘upper’, ‘pascal’, and ‘capitalise’.
  • [sqlfluff:rules:capitalisation.functions]: All functions in uppercase, for example: SUM, MAX, MIN. You can also define other options such as ‘consistent’, ‘lower’, ‘pascal’, ‘capitalise’, and choose to ignore some words with ‘ignore_words’.
  • [sqlfluff:rules:capitalisation.literals]: All fixed values in uppercase, for example: NULL, TRUE, FALSE.
  • [sqlfluff:rules:capitalisation.types]: All data types in uppercase, for example: INT, VARCHAR. You can also define other options such as ‘consistent’, ‘lower’, ‘pascal’, ‘capitalise’, and choose to ignore some words with ‘ignore_words’.
  • [sqlfluff:rules:jinja.padding]: Leave only 1 space between Jinja delimiters, for example: {{ ref(‘table’) }}.
  • [sqlfluff:rules:layout.spacing]: Handles the removal of trailing whitespace with the condition ‘no_trailing_whitespace = true’. Also, in case of extra whitespace, it removes them with the condition ‘extra_whitespace = false’.
  • [sqlfluff:rules:layout.commas]: Defines the position of the comma. Here, we use the comma on the right with ‘trailing’, but it can be on the left with ‘leading’.
  • [sqlfluff:rules:layout.functions]: Removes the space between the function and the parentheses, for example: SUM(column).
  • [sqlfluff:rules:layout.select_targets]: We defined that if only one column is being called in the CTE, we leave it on the same line as the SELECT statement, for multiple columns, we break the line.
    It’s also possible to define that this line should always be broken regardless of the number of columns, using the ‘multiple’ option instead of ‘single’.
  • [sqlfluff:rules:layout.set_operators]: The UNION and UNION ALL operators should be isolated on a new line, for example:
SELECT * FROM table
UNION
SELECT * FROM table2
  • [sqlfluff:rules:structure.nested_case]: Performing the treatment of the CASE WHEN function, for example:
CASE 
WHEN
ELSE CASE
WHEN

VVVV

CASE
WHEN
WHEN
END
  • [sqlfluff:rules:structure.subquery]: Remove the subquery from inside the JOIN and transforms it into two distinct CTEs, for example: JOIN (SELECT x, z FROM b). It is also possible to do this with a subquery within the FROM clause, using the options ‘from’ and ‘both’.

Overview of the code

Next, I present an example of a query before and after correction via the command line. The final result will depend on the settings you define according to the company’s standards.

>> How to use the tool

Installation

To install, you need to have pip and Python already installed in your environment. Then, execute the following command in the VSCode terminal. I recommend specifying the version to avoid conflicts with the configuration file, but this isn’t mandatory:

pip install "sqlfluff==2.1.2"

Running the Fix

The fix command is responsible for identifying inconsistencies and automatically correcting the code formatting. To use it, follow the steps below:

  • First, copy the path of the code. To do this, right-click on the file and select “Copy Relative Path”.
  • Next, open the terminal within VSCode and execute the following command, replacing “folder/model.sql” with the path copied in the previous step:
sqlfluff fix folder/model.sql
  • The command will list all the inconsistencies, indicating the line (e.g., L 11), and providing a description of the formatting issue.
  • Furthermore, you will be prompted to confirm if you want to apply the corrections. Type “Y” and press Enter to initiate the adjustments.

Analyze Formatting

The lint command is responsible for identifying and listing inconsistencies in the code formatting without making automatic corrections. To use it, follow the steps below:

  • First, copy the path of the code. To do this, right-click on the file and select “Copy Relative Path”.
  • Next, open the terminal within VSCode and execute the following command, replacing “folder/model.sql” with the path copied in the previous step:
sqlfluff lint folder/model.sql 
  • Overview:

— In case of failure: All inconsistencies found will be listed, indicating the line (e.g., L 21), and providing a description of the issue.

Attention: Some rules, such as the one requiring the renaming of altered columns, are displayed as warnings during lint and need to be adjusted manually. Meanwhile, other rules, like those related to spacing (LT01 and LT02), can be ignored.
For failures not corrected by the fix command, the recommendation is to go to the corresponding line and verify if the change is really necessary. Otherwise, it can be ignored as they are usually few exceptional cases.

In case of success:

SQLFluff played a crucial role in optimizing the development time for new models in DBT while also addressing internal complaints about the ideal formatting standard. Given that there is no established market standard for SQL code, Fluff’s ability to allow us to customize every aspect of this process was a significant differentiator.

However, the process of customizing the settings isn’t easy. The official documentation often omits information or presents incorrect conditions, so it is essential to be cautious when customizing the file. I recommend testing gradually as new clauses are added. Additionally, if you have more questions, suggestions, or comments about the content, feel free to reach out to me through Linkedin.

Versão em Português: https://medium.com/@alice_thomaz/44c017ff24a5

--

--