5 Reasons Why You Should Consider Using SQLFluff

How to help your team to automatically adhere to the organization’s standards, while focusing on technical and functional tasks.

Mario Simaldone
SDG Group
Published in
5 min readSep 25, 2024

--

  • Did you ever spend too much time trying to read a code that otherwise should have been easy to debug?
  • Did you ever send back a code while reviewing, with just a little feedback about uppercases, missing aliases, wrong indentation, and other style-related comments?
  • Do you want new team members to pick up the organization’s best practices faster and without the need of memorizing some (rarely updated) external documentation?
  • Do you want to give your project a consistent, well-defined, and yet customizable style across all developments?
  • Do you want a way to automate all of this easily in a Continuous Integration/Continuous Delivery environment?

If the answer to at least one of the previous questions is ‘“Yes”, then you might want to understand how SQLFluff could be beneficial for your project and how to integrate it into it.

We will present in this article some general concepts as well as the design solution for our project.

Introduction

SQLFluff is an open-source software that lints SQL code. This means that it checks for syntax errors, common pitfalls, and adherence to coding best practices. In other words, SQLFluff helps ensure the quality of your SQL code by automatically analyzing it for potential issues. It supports various SQL dialects, including Snowflake, MySQL, and PostgreSQL, making it a versatile choice for teams working with different database systems.

Let’s see it in action, using the most famous shop known to the data community: Jaffle Shop!

with

customer_orders as (
select * from {{ref('fct_customer_orders')}}
),
final as (
SELECT
customer_id
,COUNT(DISTINCT order_id)>1 is_repeat_buyer
from customer_orders
group by customer_id
)
select * from final

This code is correct and does its job, but for some readers, it may be too packed to be easily read, understood, and maintained. Let’s see what SQLFluff has to say about it.

# To try this on your windows machine, you can simply install SQLFluff
# and run it against any sql file. Note that the results will be different,
# because we are using customized configurations.
> pip install sqlfluff sqlfluff-templater-dbt
> sqlfluff lint [PATH TO FOLDER/FILE.sql] --dialect ansi
== [PATH TO FOLDER/FILE.sql] FAIL                                                                                       
L: 3 | P: 19 | JJ01 | Jinja tags should have a single whitespace on either
| side: {{ref('stg_customers')}}
| [jinja.padding]
L: 5 | P: 1 | LT08 | Blank line expected but not found after CTE closing
| bracket. [layout.cte_newline]
L: 6 | P: 5 | CP01 | Keywords must be lower case.
| [capitalisation.keywords]
L: 8 | P: 9 | LT04 | Found leading comma ','. Expected only trailing near
| line breaks. [layout.commas]
L: 8 | P: 10 | CP03 | Function names must be lower case.
| [capitalisation.functions]
L: 8 | P: 10 | LT01 | Expected single whitespace between comma ',' and
| function name identifier. [layout.spacing]
L: 8 | P: 16 | CP01 | Keywords must be lower case.
| [capitalisation.keywords]
L: 8 | P: 34 | LT01 | Expected single whitespace between the end bracket ')' and
| raw comparison operator '>'.
| [layout.spacing]
L: 8 | P: 35 | LT01 | Expected single whitespace between raw comparisons
| operator '>' and numeric literal.
| [layout.spacing]
L: 8 | P: 38 | AL02 | Implicit/explicit aliasing of columns.
| [aliasing.column]
L: 8 | P: 38 | CP01 | Keywords must be lower case.
| [capitalisation.keywords]
L: 12 | P: 1 | LT08 | Blank line expected but not found after CTE closing
| bracket. [layout.cte_newline]
All Finished 📜 🎉!

We can see that, according to how we configured the linter, different things are not compliant with what we defined to be a good SQL code. We could go back and edit our file manually, but SQLFluff has another tool for us: the fixer, which will edit the file directly for us.

> sqlfluff fix [PATH TO FOLDER/FILE.sql] --dialect ansi
with

customer_orders as (
select * from {{ ref('fct_customer_orders') }}
),

final as (
select
customer_id,
count(distinct order_id) > 1 as is_repeat_buyer
from customer_orders
group by customer_id
)

select * from final

We can see that every line that failed the linter, has been modified to be compliant with the defined rules. To learn more about the available rules and their possible configuration, official documentation can be found here.

Configuration and Integration

Now that we have SQLFluff up and running, we can add a configuration file to our project, so the linter knows what we consider good practice and what we don’t want to see in our files. The official documentation does a really good job of explaining how to set up a proper configuration file, so we will refer to that for examples you could use in your projects.

One thing it is worth pointing out is that, due to its nature, the configuration file needs to be always up to date with the latest standards and best practices, for the linter to capture the style we want to imprint in the project. For this reason, the same configuration file also acts as a piece of documentation that every developer can easily peek at, to know if a rule is currently enforced or not.

Back to our integration strategy and with the help of the official documentation about production deployment, we can see that it’s possible to use the pre-commit framework to configure a hook that will trigger the linter and/or the fixer (we currently use only the linter) automatically just before a commit is made. Take a moment to appreciate the two main benefits of this implementation:

  • once again, we have created a tool that helps developers understand what best practices are in our organization. Whenever someone tries to commit some changes, he will be prompted with the linter output, which, as we have seen, will redirect to the exact lines and points where the code is failing to adhere to the defined rules;
  • at the same time, we now have a tool that allows only readable, maintainable, and consistently styled code to be pushed to any branch.

To complete the overview of our integration process, the same pre-commit command is also executed with Jenkins in the CI/CD environment: whenever a pull request is raised, the pre-commit checks (where we included SQLFluff) are automatically performed only against the new or modified files. The PR is marked as failed if any check fails, saving resources and time that would otherwise have been spent to allow non-compliant code to make it to a higher environment.

Conclusion

SQLFluff is a lightweight tool that takes little effort to implement and only some thinking and discussion on what the desired configuration will be. It’s worth giving it a try to understand if it suits your project or not.

About the author

Mario Simaldone is an Analytics Engineer and Senior Consultant in the DATAOPS practice unit at SDG Group Italy. He has experience with ETL processes and technologies like DBT and Snowflake. https://www.linkedin.com/in/mario-simaldone-808aa2139/

--

--