MQL: Generate SQL from Natural Language with 85% Accuracy

Shuru Team
Shuru
Published in
6 min readApr 3, 2024

Originally published on Shuru Blog

Welcome to the exciting world of MQL, short for My Query Language. It’s an innovative tool designed to make database queries easy for everyone, not just experts. By turning natural language into SQL commands, MQL opens up the world of databases to more people. Let’s dive into what makes MQL special and why we’re sharing it with the world as an open-source project.

GitHub Repository: Find here

What is MQL?

MQL is a game-changer in database querying. This tool is designed to simplify the process of generating database queries directly from natural language inputs. What’s cool is that anyone can use it, even without knowing how to code. Just add your database by connecting or uploading schema to MQL, ask a query in your language and get the SQL.

How MQL Works?

Understanding RAG

RAG, or Retrieval-Augmented Generation, is a technology that enhances the performance of MQL by combining the power of information retrieval with the generative capabilities of advanced natural language processing models. Here’s how RAG works and its role within MQL:

Initial Retrieval with pg_vector: The first step in MQL’s process involves pg_vector, a powerful tool for vector search within PostgreSQL databases. pg_vector is utilized to navigate the database schema, identifying and retrieving details relevant to the user’s query. This approach ensures that MQL has access to structured information about the database’s tables, columns, and their relationships, providing a solid foundation for accurate query generation.

Enhanced Schema Detail Retrieval: Once the initial retrieval identifies the relevant parts of the schema, the next step dives deeper into extracting detailed information about these components. This is where RAG’s capabilities are combined with pg_vector’s efficiency. Detailed schema information (including table names, column names, types, and relationships) is retrieved to understand the database’s structure more comprehensively.

SQL Generation: Leveraging the OpenAI API, RAG then generates a precise SQL query based on the user’s input and the contextual insights gained from the retrieved information. This process ensures that the generated SQL commands are not only syntactically correct but also semantically aligned with the user’s intent.

At its core, MQL leverages RAG with OpenAI API to translate natural language queries into precise SQL commands. This process involves a user-friendly interface where users can input queries in everyday language and receive a corresponding SQL query that can be run against a database. The system is built with a focus on PostgreSQL, a widely used open-source relational database system.

Accuracy Test Score

We have taken an e-learning platform-based database set for testing purposes. We have generated an elearning_schema file from this database as per the provided steps at the platform. Our testing process involved running 50 natural language queries through the MQL platform. The MQL achieved around 85% success rate, accurately translating 43 out of 50 queries. However, when considering the precision of the translations, about 74% (37 out of 50) of the queries were executed perfectly, with 6 queries correctly interpreted but missing a semicolon at the end. This left us with 7 queries that resulted in errors. The resources like an elearning_schema file and a CSV of query mappings are located in the test_data folder of the project.

Key Features

Natural Language to SQL Translation

MQL’s primary feature is its ability to accurately translate natural language queries into SQL. This functionality opens up database querying to a broader audience, removing the barrier of learning SQL syntax for non-technical users.

High Accuracy and PostgreSQL Support

The tool boasts an impressive accuracy rate in translating queries, with ongoing improvements and updates aimed at further enhancing its performance. Achieved about 85% success rate in translating natural language to SQL.

Security

Ensures that user databases are accessed securely, requiring login credentials to provide a protective layer.

Open Source

MQL is for sharing. We want everyone to help make it better.

Getting Started

Getting started with MQL is straightforward, thanks to detailed documentation and support for Docker. Users can clone the MQL repository from GitHub and follow the provided installation instructions to set up the tool locally.

The GitHub documentation includes steps for both Docker-based and non-Docker setups, ensuring flexibility based on user preference and system requirements.

docker-compose.yaml

version: "3"
services:
frontend:
build:
context: ./client
ports:
- "3000:3000"
networks:
- mql-net

postgres:
build:
context: ./storage
ports:
- 5432:5432
environment:
PGUSER: postgres
POSTGRES_PASSWORD: password
healthcheck:
test: ["CMD-SHELL", "pg_isready", "-U", "postgres", "-d", "mql"]
interval: 10s
timeout: 60s
retries: 5
start_period: 80s
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- mql-net

backend:
build:
context: ./server
ports:
- "8000:8000"
networks:
- mql-net
depends_on:
postgres:
condition: service_healthy

networks:
mql-net:
name: mql-database-network

volumes:
postgres-data:Code language: JavaScript (javascript)

For those opting to run MQL without Docker, the setup involves installing prerequisites such as Python, Node, and PostgreSQL, along with the pgvector extension for PostgreSQL. The documentation guides users through each step, from database setup to updating environment variables and running the setup script.

Supported Databases

As of the current version, MQL is designed to work exclusively with PostgreSQL.

Future Roadmap

The future of MQL looks promising, with a roadmap full of enhancements that aim to enrich user experience and expand functionality.

  • Implementation of Query Execution: The primary focus will be on implementing a robust and efficient system for executing queries within platform.
  • Improving Direct Database Connections: Improvement like checking persistent connections in directly connected databases from the front end.
  • Support for MySQL database: Enable seamless interaction with one of the world’s most popular open-source relational databases.
  • Improvements in Query Accuracy: Enhancing the precision of generated SQL queries to better match user intents.
  • Data Visualization: Introducing features for visualizing data directly within the platform.
  • Integration with Communication Tools: Adding connectivity with tools like Slack for streamlined communication and collaboration.
  • Comprehensive Platform Development: Transforming the tool into a broader platform for database interaction and analysis beyond just generating SQL queries.

Contribution and Community

MQL’s development is driven by community feedback and contributions. Whether you’re a developer interested in improving the tool, a user with suggestions for new features, or someone keen on bug reporting, your input is invaluable. The project’s contribution guidelines outline the process for reporting bugs, suggesting enhancements, and submitting pull requests, ensuring a collaborative and respectful community environment.

Conclusion

MQL represents a significant step forward in making database querying more accessible, efficient, and user-friendly. By bridging the gap between natural language and SQL, it opens up new possibilities for data analysis and interaction, particularly for those without a technical background. As the project continues to grow and evolve, it promises to become an indispensable tool for a wide range of users, from developers and data analysts to business professionals and educators. We invite you to explore the tool, contribute to its development, and join us in shaping the future of database querying.

Start with MQL today and help us shape the future of databases.

Check other open source tool: iOSKickstart

--

--

Shuru Team
Shuru
Editor for

We help businesses navigate product & technology complexities in validating & scaling digital solutions.