Analytics Engineering 1–0–1: SQLFluff + dbt™

Paradime Labs
paradime.io
Published in
4 min readJul 28, 2024

First Published July 17, 2024

In the evolving landscape of data engineering and analytics, SQL code quality remains a critical concern. SQLFluff is a sophisticated SQL linting solution, addressing a longstanding gap in tools for SQL code analysis. It brings robust code styling and error detection capabilities to SQL, complementing e.g. dbt™ (data build tool).

Core Architecture and Functionality

SQLFluff’s architecture is built on a powerful parsing engine that converts SQL into an abstract syntax tree (AST), enabling deep, context-aware analysis of SQL structures. This approach allows for more comprehensive SQL linting than traditional regex-based methods.

Key Features Include

  • Dialect-specific parsing for major SQL flavors used in dbt™ projects
  • Custom rule creation using Python, extending linting capabilities
  • Integration with jinja templating, crucial for dbt™ SQL models

Parsing and Rule Application in SQL Linting

  1. Lexing and parsing: SQLFluff tokenizes the input SQL and constructs an AST, handling complex SQL constructs common in dbt™ transformations.
  2. Rule application: The linter traverses the AST, applying configured rules to each node, allowing for sophisticated checks of both syntax and style in SQL code.
  3. Error generation: When rule violations are detected, SQLFluff generates detailed error objects, aiding in the improvement of SQL and dbt™ model quality.

SQLFluff Configuration Template for dbt™

1[sqlfluff]
2templater = dbt
3dialect = snowflake
4runaway_limit = 10
5max_line_length = 80
6indent_unit = space
7
8[sqlfluff:indentation]
9tab_space_size = 4
10indent_unit = space
11
12[sqlfluff:layout:type:comma]
13line_position = trailing
14
15[sqlfluff:rules]
16allow_scalar = True
17single_table_references = consistent
18unquoted_identifiers_policy = all
19
20[sqlfluff:rules:L010]
21capitalisation_policy = consistent
22
23[sqlfluff:rules:L030]
24extended_capitalisation_policy = consistent
25
26[sqlfluff:templater:dbt]
27project_dir = .
28profiles_dir = ~/.dbt
29profile = default
30target = dev
31
32[sqlfluff:templater:jinja]
33apply_dbt_builtins = True

This configuration template sets up SQLFluff to work with dbt™ projects. Here’s a breakdown of the key parts:

  1. The templater = dbt line specifies that we’re using the dbt-sqlfluff-templater.
  2. The dialect = snowflake line sets the SQL dialect to Snowflake. You can change this to match your database type.
  3. The [sqlfluff:templater:dbt] section configures the dbt-specific settings, including project directory, profiles directory, profile name, and target.
  4. The [sqlfluff:templater:jinja] section enables dbt built-ins for Jinja templating.

You can adjust the rules and other settings as needed for your specific project requirements.

Configurability and Customization for dbt™ Projects

SQLFluff offers granular control over SQL linting behavior, particularly useful for dbt™ users:

  • Support for .sqlfluff and pyproject.toml for project-specific configurations
  • Rule-specific parameters for fine-tuning SQL linting behavior in dbt™ models
  • Inline comment directives for rule suppression or configuration overrides in SQL files

This level of configurability allows dbt™ teams to enforce consistent SQL styling across projects.

Workflow Integration in dbt™ Environments

SQLFluff integrates seamlessly into modern data workflows, including dbt™ centric processes:

  1. Version control integration: Pre-commit hooks for git-based workflows in dbt™ projects• Support for incremental linting on changed SQL files
  2. CI/CD pipeline implementation: Command-line interface suitable for automated testing of dbt™ models• Exit code functionality for easy integration with CI tools in dbt™ pipelines
  3. IDE support: Plugins available for popular IDEs used in dbt™ development• Language Server Protocol (LSP) implementation for real-time SQL linting

Auto-fix Capabilities for SQL and dbt™ Models

SQLFluff’s auto-fix functionality leverages the AST to make intelligent code corrections in SQL files and dbt™ models:

  • Syntax-aware fixes that preserve query semantics
  • Configurable fix behavior to control aggressiveness of changes in SQL code
  • Diff generation for manual review before applying fixes to dbt™ SQL models

Advanced Use Cases in dbt™ and SQL Environments

  1. Custom rule development: Engineers can extend SQLFluff by writing custom SQL linting rules, tailored to specific dbt™ project needs.
  2. Integration with data catalogs: SQLFluff can incorporate data catalog information, enabling validation of table and column names used in dbt™ models.
  3. Metadata-driven Linting: Teams can implement context-aware SQL linting, applying different rules based on the purpose of dbt™ models or SQL queries.

Challenges and Considerations for SQL Linting in dbt™ Projects

While SQLFluff offers powerful SQL linting capabilities, implementation at scale in dbt™ environments can present challenges:

  • Initial setup and configuration for large, established dbt™ codebases
  • Performance considerations for extensive SQL repositories
  • Balancing strict linting rules with the flexibility needed in complex dbt™ transformations

Running SQLFluff in Paradime

With Paradime you can execute SQLFluff with one click using the Prettify button in the terminal toolbar.

Check out full tutorial HERE.

So why choose Paradime? Our product-based pricing model provides a stable alternative to the variable costs of consumption-based systems like dbt Cloud™. Here’s what sets us apart:

  • Sensible pricing: Benefit from fixed rates that make budgeting easier and eliminate unexpected expenses as your usage grows.
  • Enhanced productivity: Paradime’s AI-driven code IDE increases efficiency, offering a significant advantage over older solutions like dbt Cloud™, which has encountered price increases and complexity issues over the last years.

Schedule a call with our team to discover how AI-powered analytics engineering can maximize your impact on the business.

Wrap Up

SQLFluff is a big improvement in how people use SQL. It adds advanced cleaning features to a key part of the modern data stack. For analytics engineers working with dbt™ and SQL, it offers not just a tool for making code look good, but a place to share SQL best practices and improve code quality in a systematic way.

By using SQLFluff’s advanced features and putting it into dbt™ workflows, teams can greatly reduce SQL-related errors, make code easier to keep up with, and speed up development. As data architectures and tools like dbt™keep changing, SQL linting tools like SQLFluff will become more important to keep code quality and consistency high across complex data environments.

--

--