A Brief History of SQL and the Rise of Graph Queries

What Is GQL and Why Is It Important?

Fanghua (Joshua) Yu
Neo4j Developer Blog
5 min readApr 23, 2024

--

Transitioning from SQL to GQL — Image generated by DALL-E by author

This article has minor changes since its first publish, thanks to suggestions from Philip Rathle on SQL 2023 and GQL as a new language standard.

The Birth of New Query Language Standard

Since the last major enhancement of Structured Query Language (SQL) decades ago, the database industry has introduced a new ISO/IEC standard language called Graph Query Language (GQL), marking a significant milestone.

The development of GQL began with the introduction of Cypher as early as 2010, the graph query language developed by Neo4j, which evolved into openCypher in 2015, and finally transitioned into the draft ISO standard GQL in 2019. Just over a week ago, ISO published it as ISO/IEC 39075:2024.

A Brief History of SQL

SQL, originally developed by IBM in the early 1970s, has undergone significant evolution over the decades. It has been standardized by both the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO). Each revision of the SQL standard has aimed to address the evolving needs of database systems in handling increasingly complex data and queries, reflecting shifts in technology and user demands.

A timeline of the major events and versions of the SQL standards with key developments at each stage is shown below.

History of SQL Standards Part 1 — Image by author

1. SQL-86 / SQL-87

- Year: 1986/1987

- Standardization: ANSI X3.135–1986 and ISO 9075:1987

- Details: This was the first formal SQL standard. It established the foundation for SQL and introduced basic syntax for Data Definition Language (DDL), Data Manipulation Language (DML), and querying capabilities.

2. SQL-89

- Year: 1989

- Standardization: Minor revision by ANSI and ISO

- Details: This revision included minor enhancements and introduced integrity constraints, primarily focusing on the inclusion of foreign keys.

3. SQL-92 (SQL2)

- Year: 1992

- Standardization: ANSI X3.135–1992 and ISO 9075:1992

- Details: This major revision greatly expanded the framework of SQL-89. It introduced more advanced features such as joins, subqueries, set operations, and significant enhancements in DDL and DML. SQL-92 also standardized various syntax elements, making SQL more portable across different database systems.

4. SQL:1999 (SQL3)

- Year: 1999

- Standardization: ISO/IEC 9075:1999

- Details: This revision introduced object-oriented features, allowing SQL to handle complex data types and objects. It added support for features like triggers, recursive queries, and the ability to create and use stored procedures. This version was substantially enhanced over SQL-92, bringing more flexibility and capabilities to the SQL language.

History of SQL Standards Part 2 — Image by author

5. SQL:2003

- Year: 2003

- Standardization: ISO/IEC 9075:2003

- Details: This update included enhancements to SQL’s capabilities with XML, integrating XML-related features and allowing SQL to define, query, and manipulate XML data directly. It also introduced the SQL Persistent Stored Modules (PSM), which expanded support for stored procedures and other control-of-flow structures.

6. SQL:2006

- Year: 2006

- Standardization: ISO/IEC 9075:2006

- Details: This revision was more of an incremental update, mostly focusing on integrating the XML data type and refining the specifications related to XML querying and handling.

7. SQL:2008

- Year: 2008

- Standardization: ISO/IEC 9075:2008

- Details: SQL:2008 introduced enhancements like the truncate statement, fetch clause improvements for limiting query results, and enhancements in merge statements and temporal data support. This version aimed to refine and optimize various functionalities introduced in earlier standards.

8. SQL:2011

- Year: 2011

- Standardization: ISO/IEC 9075:2011

- Details: This version introduced substantial improvements in temporal data handling, allowing more sophisticated querying of data over time (system-versioned and application-time period tables). It also improved support for over-clause and window functions.

9. SQL:2016

- Year: 2016

- Standardization: ISO/IEC 9075:2016

- Details: SQL:2016 added features like JSON support, allowing SQL to handle JSON data similarly to XML. This version also enhanced features related to period temporal tables and introduced polymorphic table functions.

10. SQL:2019

- Year: 2019

- Standardization: ISO/IEC 9075:2019

- Details: The latest major revision at the time of writing, SQL:2019, further extends SQL’s functionality with enhancements in areas such as JSON, SQL routines and types using Python, and more advanced sharding capabilities.

11. SQL:2023

- Year: 2023

- Standardization: ISO/IEC 9075:2023

- Details: New features for property graph queries (PGQ) over data stored in tables and more support of JSON data types.

What Is GQL?

GQL is for property graph, which is highly performant and scalable for high-volume and complex data. With GQL, people can do graph traversals over millions of relationships to reveal hidden patterns and insights by using the graph pattern matching features.

A property graph data structure consists of nodes (discrete objects) that can be connected by relationships, and nodes and relationships can have properties to store values.

The property graph database model consists of the following:

  • Nodes describe entities (discrete objects) of a domain and are usually illustrated using a circle.
  • Nodes can have zero or more labels to define (classify) what kind of nodes they are.
  • Relationships describe a connection between a source node and a target node, illustrated using a straight line with an arrow.
  • Relationships always have a direction (one direction).
  • Relationships must have a type (one type) to define (classify) itself.
  • Nodes and relationships can have properties (key-value pairs) to store rich semantics of them.

A simple graph of three nodes is shown below, with two relationships representing actors and the director of Forrest Gump.

A sample property graph — Source: Neo4j

Here is a more complicated graph showing connections among thousands of diseases, drugs, and pathogens published in Neo4j for Diseases.

The topology of the data in this project. The green dots represent the “diseases”, the red ones represent the drugs, and the blue ones represent the pathogens. Image courtesy of Sixing Huang

GQL has already been widely used in use cases like fraud, cybersecurity, recommendations, and social networks, and recently knowledge graph in solutions like Retrieval-Augmented Generation (RAG) for GenAI to have better accuracy and fewer hallucinations.

Summary

GQL addresses the need for sophisticated queries that reflect the increasingly complex and connected nature of data in the real world. The benefits of graph databases not only exist in providing more intuitive and powerful ways to visualize and interrogate data relationships but also in storing and processing data more efficiently. GQL, as the new database query language standard, is a pivotal development for advancing data analytics and management in the era of big data, GenAI, and beyond.

What to Learn More?

To experience graph pattern matching, I recommend the free Cypher Fundamentals online course.

--

--

Fanghua (Joshua) Yu
Neo4j Developer Blog

I believe our lives become more meaningful when we are connected, so is data. Happy to connect and share: https://www.linkedin.com/in/joshuayu/