Google Pipe Syntax: Modernizing SQL Without Sacrificing its Strengths
How Pipe Syntax solves SQL’s limitations to become easier to learn, use and extend without any breaking change in the current SQL ecosystem.
SQL has reigned as the data manipulation king for over 50 years. Its power lies from its declarative nature, making it easy to express complex data transformations. But, even with its robust core, SQL syntax hasn’t aged so well. The rigid clause order, endless nesting of subqueries, and the confusing “inside-out” data flow make it challenging to read, write, and maintain complex queries. Plus, it is not very extensible.
As a Google Cloud Champion, I had the opportunity to beta-test the Pipe Syntax with my team at Carrefour. The experience was incredible!
Pipe syntax simplifies the standard SQL syntax by breaking it down into a sequence of independent linear operations separated by a
|>
, each taking a table as input and generating a new table as output.
This makes queries more readable, easier to maintain, and more flexible without changing the core functionalities of SQL.
The article objective is to really understand why the Pipe syntax matters and what it offers. This first part of this article explores how Pipe syntax adresses standard SQL’s limitations. Then in a second time, we will discuss how it still preserve its core strengths.
To see how to use it, to get your hand dirty, and be ready to transcribe 100% of your existing queries with a near-zero learning curve, a following guide will arrive so feel free to subscribe.
Cheat sheet: the list of the main pipe operators is at the end of the article.
Summary
Introduction
Modernizing SQL…
1. Inside-Out Data Flow
2. Side-Effects at a Distance
3. Rigid Clause Order
4. Redundant Clauses
5. Subquery Overload
6. Limited Extensibility… Without Sacrificing its Strengths
1. Powerful Foundation
2. Interoperability
3. Composability
4. Widespread Adoption
5. Mature EcosystemMy Thoughts
Modernizing SQL…
SQL dominates the data manipulation landscape for good reasons. But its syntax has its limitations. This is a constant source of frustration for developers, limiting productivity and making complex data projects unnecessarily laborious.
Recognizing these limitations, Google, inspired by other syntaxes like KQL or Apache Beam, has developed Pipe syntax to address these limitations without sacrificing SQL’s core strengths.
We will first focus on SQL’s limitations and then see how Pipe syntax preserves SQL’s core strengths.
Inside-Out Data Flow
Tracing the data flow in SQL, starting from the deepest nested table reference, can be challenging as queries grow in complexity. A more linear, intuitive flow would make complex queries easier to manage.
This standard SQL query demonstrates the challenges of tracing data flow. The query’s logic starts inside the subquery, requiring a mental jump to understand the flow.
SELECT *
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= '2024-01-01'
)
Pipe syntax simplifies this by providing a linear flow, with a straight path from the starting point to the final result. It retrieves the data from orders filtering for customers in 2024 and enrich the data.
FROM orders
|> WHERE order_date >= '2024-01-01'
|> SELECT DISTINCT customer_id
|> INNER JOIN customers USING(customer_id)
|> SELECT *
Every operation in Pipe syntax produces a table, so the final
SELECT *
is optional. It is implicit if you omit it. So as always be aware of costs.
Side-Effects at a Distance
Even within a single query, it’s difficult to isolate and understand the impact of individual clauses. Operations like aggregation (GROUP BY
) can have unintended consequences on other parts of the query (like in the SELECT
). This makes debugging and understanding complex queries challenging.
You need to jump between the GROUP BY
clause and the SELECT
statement to understand which column represents a dimension and which represents an aggregation. This can lead to confusion in more complex queries.
SELECT customer_id, SUM(order_amount) AS total_spent
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
Pipe syntax addresses this by clearly separating dimensions and aggregates, making the query easier to read and understand. Plus, each step is isolated from the others.
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> SELECT customer_id, total_spent
Rigid Clause Order
The strict order of clauses feels arbitrary and unintuitive. This rigid structure masks the actual order of execution, making it difficult to understand the query flow.
The traditional order of execution is FROM
→ WHERE
→ GROUP BY
→ ORDER BY
→ SELECT
.
SELECT customer_id, SUM(order_amount) AS total_spent
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
ORDER BY total_spent DESC
Pipe syntax’s linear flow aligns the order of writing with the order of execution, making the logic more apparent and easier to understand and maintain.
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> ORDER BY total_spent DESC
|> SELECT customer_id, total_spent
Redundant Clauses
SQL uses multiple clauses (WHERE
, HAVING
, QUALIFY
) to express similar filtering operations, adding unnecessary complexity and slowing down the learning curve.
In standard SQL, the HAVING
clause is used to filter results after aggregation, while the WHERE
clause filters before aggregation. This difference can be confusing.
SELECT customer_id, SUM(order_amount) AS total_spent
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
HAVING SUM(order_amount) > 1000
Pipe syntax eliminates the need for a separate HAVING
clause, simplifying the query and making it more intuitive to understand.
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> WHERE total_spent > 1000
|> SELECT customer_id, total_spent
Subquery Overload
SQL relies on subqueries (or CTEs) even for simple operations, such as filtering data based on conditions outside the three standard locations or aggregating data more than once. This can make queries difficult to read and maintain.
This query involves multiple levels of nesting to perform a simple calculation: identifying and counting high-value customers.
WITH high_value_customers AS (
SELECT customer_id, SUM(order_amount) AS total_spent
FROM orders
WHERE order_date >= '2024–01–01'
GROUP BY customer_id
HAVING SUM(order_amount) > 1000
)
SELECT COUNT(*) AS total_high_value_customer
FROM high_value_customers
CTEs offer some improvement compared to subqueries, but they don’t fundamentally solve SQL’s reliance on subquery-like structures. Pipe syntax provides a more direct and intuitive solution with it step-by-step approach. It significantly improves readability and maintainability of the previous query.
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> WHERE total_spent > 1000
|> AGGREGATE COUNT(*) AS total_high_value_customer
Limited Extensibility
Extending SQL with new functionality is challenging, often forcing developers to rely on workarounds, awkward extensions, or user-defined functions (UDFs).
Even for Google developers themselves, adding extensions can be difficult without a clunky syntax. Just look at the current syntax of the ML.PREDICT
table-valued function (TVFs)
SELECT feature_1, feature_2, prediction
FROM ML.PREDICT(
MODEL `project.dataset.my_model`,
(
SELECT 'a' AS feature_1, 'b' AS feature_2
)
)
The ML.PREDICT
TVFs call is significantly cleaner with the Pipe syntax CALL
operator.
SELECT 'a' AS feature_1, 'b' AS feature_2
|> CALL ML.PREDICT(MODEL `project.dataset.my_model`)
The CALL
operator already exists and we can imagine in the future the integration of new operators and function within the pipeline like ASSERT
, LOG
, or DESCRIBE
, which are proposed by Google. For example, a **potential** ASSERT
operator could be:
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> ASSERT order_amount >= 0, 'Order amount can not be negative'
|> WHERE total_spent > 1000
|> AGGREGATE COUNT(*) AS total_high_value_customer
… Without Sacrificing its Strengths
SQL dominates the data manipulation landscape for good reasons and is here to stay. Many attempts have been made to replace SQL, but none have succeeded. Its core strengths, including its powerful concepts and wide adoption, remain critical.
We can’t dethrone the king,
but we can help him modernize his kingdom.
Pipe syntax is not a replacement for SQL, it’s a modernization. It aims to support at least the same features as traditional SQL built upon its strengths.
Powerful Foundation
SQL’s core concepts offer a robust framework for manipulating structured data. Its declarative nature, focusing on what needs to be done rather than how to compute it, is a key advantage.
No matter if you use standard SQL or Pipe syntax, the following queries clearly express the desired result without having to specify how to compute it:
-- Standard SQL
SELECT *
FROM customers
WHERE country = 'USA' AND age > 21
-- Pipe syntax
FROM customers
|> WHERE country = 'USA' AND age > 21
Using the same building blocks and only changing the syntax has minimized Google’s implementation effort and allowed them to reuse code optimized over decades. Another benefit is that, even if this article shows the syntax through the BigQuery example, GoogleSQL is a shared implementation across all SQL systems including Spanner, Procella, F1, or the open source ZetaSQL.
This decision to extend SQL rather than create a completely new language also ensures feature parity: at least the same features are supported.
Interoperability
Reusing existing SQL building blocks offers the key advantage of being fully interoperable with the current ecosystem. Unlike new languages struggling with adoption due to integration challenges, pipe syntax leverages the same query optimizer behind the scene than the standard SQL.
Pipe syntax only appears to process tables sequentially (table-in, table-out), but it’s actually converted into an equivalent form for optimization — just like standard SQL. For example the following syntaxes will be converted into the same form.
-- Standard SQL
SELECT *
FROM customers
WHERE country = 'USA' AND age > 21
-- Pipe syntax (option 1)
FROM customers
|> WHERE country = 'USA' AND age > 21
-- Pipe syntax (option 2)
FROM customers
|> WHERE country = 'USA'
|> WHERE age > 21
You can even integrate Pipe syntax operators into your existing queries to extend them without having to rewrite the entire query. For example, adding a technical processing date to your query result:
SELECT state, COUNT(*) AS customer_count
FROM customer
GROUP BY state
|> EXTEND CURRENT_DATE() AS processed_date
This makes incremental adoption easy, at your own pace, starting with small changes and gradually integrating it into your workflows. No need to migrate your entire ecosystem!
Composability
As queries with pipe operators can be used anywhere you could use a standard SQL query, it keeps the core strength of SQL to be composable with subqueries, WITH
clauses, tables, views, table functions, etc.
SQL’s conceptual data model and top-level syntax remain unchanged. This means you still can use DDL, DML, and factor your queries with subqueries or CTEs.
CREATE OR REPLACE VIEW high_value_customers
AS
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> WHERE total_spent > 1000
|> SELECT customer_id, total_spent
We can even imagine the future integration of terminal operators in Pipe syntax with DML or DDL statements. Imagine replacing the preceding query by:
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> WHERE total_spent > 1000
|> SELECT customer_id, total_spent
|> CREATE OR REPLACE VIEW high_value_customers
Widespread Adoption
SQL’s dominance is undeniable. Adoption is the real challenge for Pipe syntax. It has the potential for success thanks to its strategic design.
Pipe syntax is not a new language, just a new syntax.
It combines familiar SQL and pipe syntaxes for easing adoption. Current SQL developers will have an almost nonexistent learning curve, and the linear flow of the pipe syntax will be more intuitive for beginners.
The
|>
characters have been chosen over the|
character to avoid ambiguity with the existing bitwise OR operator.
It also removes the need for migration, being fully integrated into the existing ecosystem. So it can benefit from incremental adoption where developers integrate it gradually into existing workflows.
The potential is clear, but widespread adoption requires education, documentation, tooling, and community building. Even if adopted, it will probably take a decade.
Mature Ecosystem
SQL benefits from a very, very mature ecosystem. And Pipe syntax goes beyond simply offering a new syntax. It can make editing, debugging, and IDE integration easier, helping SQL development to be more efficient in the future. For example:
- Iterative query edition: Pipe syntax linear flow makes it easy for developers to write and modify queries incrementally, adding operators sequentially and observing the results in real time. No more struggle with complex nested subqueries or extensive refactoring.
- Powerful Debugging: The “prefix property” can make debugging so smooth. Any section of a query up to a pipe character is a valid query, letting you test intermediate results and gain a deeper understanding of data transformations at every stage.
- IDE and UI Integration: Its sequential flow can make it a perfect fit for modern IDE or UI features. We can expect more accurate suggestions and even the possibility of interactive SQL debuggers to allow developers to step through queries and analyze intermediate results.
- Simplified Code Generation: Pipe syntax straightforward structure can simplifie the process of code generation. Tools like Looker and ORMs (Object-Relational Mappers) can produce cleaner, more readable SQL. It much easier for developers to understand and evaluate the generated code rather than with actual nested subqueries.
- GenAI Potential: The clear, sequential nature of Pipe syntax can be easier for large language models (LLMs) to understand and generate queries.
My Thoughts
From my experience, I see no reason not to adopt this new syntax. It offers a way more intuitive and accessible approach, particularly for those new to the field.
Adoption will be the biggest challenge! I think it lacks a “killer feature” that would make the switch to this new syntax obvious for developers. Perhaps the key lies in the developer experience itself.
Standard SQL has a mature ecosystem compared to the Pipe syntax. A robust BigQuery UI or IDE extensions could be crucial to make our query editing, debugging, and iteration smoother. (And it really has the potential to)
If you are not yet convinced like I am me, just challenge you to only write with this new syntax during one week. I promise, you won’t go back to the standard SQL!
If you found this article helpful, consider to share it, give it some claps, leave a comment, subscribe, or follow me on LinkedIn. It really motivates me.
Thank you! 🙏