How to Develop Your Distributed SQL Statement in Apache ShardingSphere
In the previous articles “An Introduction to DistSQL” and “Integrating SCTL Into DistSQL’s RAL— Making Apache ShardingSphere Perfect for Database Management”, the Apache ShardingSphere Committers shared the motivations behind the development of DistSQL, explained its syntax system, and impressively showcased how you can use just one SQL to create a sharding table.
Today, to help you gain a better understanding of DistSQL and develop your own DistSQL syntax, our community author analyzes the design & development process of DistSQL and showcases how you can implement a brand new DistSQL grammar in four stages of the development life cycle (i.e. demand analysis, design, development & testing).
What is DistSQL?
Like standard SQL, DistSQL or Distributed SQL is a built-in SQL language unique to ShardingSphere that provides incremental functional capabilities beyond standard SQL. Its design purpose is to empower resource and rule management with SQL operation capabilities. For more information about DistSQL, please read “Build a data sharding service with DistSQL”.
Why do you need DistSQL?
Redefining the boundary between middleware & database, and allowing developers to leverage Apache ShardingSphere as if they were using a database natively is DistSQL’s design goal.
Therefore, DistSQL provides a syntax structure and syntax validation system similar to standard SQL, to avoid a steep learning curve. Another advantage of DistSQL is its ability to manage resources and rules at the SQL level without configuration files.
How to develop DistSQL?
- ANTRL4 is a tool that translates your grammar to a parser/lexer in Java (or another target language). It’s used as a parser. You need to configure it before starting to develop your DistSQL. Want to get started with ANTLR 4? Please refer to this ANTRL4 Concise Tutorial.
- IntelliJ IDEA is an integrated development environment written in Java for developing computer software. You also need the plug-in ANTLR v4 to test the grammar rules defined by ANTRL4.
- First, choose the right Test Rule:
- Input the statement to be verified in ANTLR Preview:
You need to know the DistSQL execution process as well as the basics of synatics and plug-ins. The complete DistSQL execution process is truly complicated, but the awesome architecture of ShardingSphere allows developers to develop DistSQL features without having to pay attention to the whole process.
However, you still need to take care of the following core procedures:
Note: Here, we take data sharding as an example. Be aware that different features have different vistors.
After understanding the execution process of DistSQL, you now can appreciate the practical case and learn how to develop your first DistSQL statement.
In the previous article “An Introduction to DistSQL”, the author showcased how you can leverage DistSQL to create a sharding with the staement
show sharding table rules.
Now, we have a new request: How can you use a DistSQL statement to quickcly query shard quantity of each tabale shard. The designed syntatic statement is as follows:
show sharding tables count [from schema] ;
- MySQL containing databases and tables for sharding.
- Zookeeper is used as Registry Center
- ShardingSphere-Proxy 5.0.0
- Define the statement
Add the following statement definition into the file
src/main/antlr4/imports/sharding/RQLStatement.g4. When it’s done, you can use ANTLR v4 to test it.
Please ensure that all keywords in that statement are defined. For example,
COUNT is a undefined statement here, so you need to define it in
After you define the statement, you also need to add it into the file
ShardingDistSQLStatement.g4. It's for parsing rounter.
Now, it’s time to use
shardingsphere-sharding-distsql-parser to compile and generate the relevant objects.
2. Parse the definition
Then you also need to add a DistSQLStatement object of the definition in
shardingsphere-distsql-statement to save the variable attributes of the statement. For example, the
schemaName of the statement definition needs to be saved to the object
Since ShardingSphere uses ANTLR’s Visitor mode, in terms of definition handling, it’ required to rewrite
ShardingDistSQLStatementVisitor. The purpose of this method is to create a
ShowShardingTablesCountStatement object and save the related variable attributes to the object
shardingsphere-distsql-statement actually depends on
shardingsphere-sharding-distsql-parser, so it's necessary to compile
3. Handle data and return results
Data handling is managed by the
execute method of
getRowData returns the results. Different types of statement definitions focus on different things. For instance, when DistSQLResultSet is used as the result storage object, result data is assembled in the method
Show the execution method and the
DistSQLResultSet as shown in the below image:
init gets and assembles data, and
getRowData returns row data.
getType is also obviously in the class. The method belongs to the
TypedSPI interface, so
ShardingTablesCountResultSet also needs to add
org.apache.shardingsphere.infra.distsql.query.DistSQLResultSet into the directory
src/main/resources/META-INF/services of the current module to complete the SPI injection. The path and content are as follows:
Now, you have successfully developed the feature of this statement definition.
4. Finish your unit test and parse test
When you complete the basic feature development, to ensure its continuous usability, you need to add test cases to the new class or method, and to complete parse tests for the new syntax. The following code block is the unit test of
In addition to the unit test, you are also required to complete a parsing test for the grammar definition in
shardingsphere-parser-test. The purpose is to parse the input DistSQL into a
DistSQLStatement and then compare the parsed statement with your expected
TestCase object. The steps are as follows:
a. Add the SQL you want to test in
b. Add a test case in
c. Add a
TestCase object whose function is to save the result defined in the case
d. Use the
SQLParserTestCases class to load
e. Add the right
Assert object to the
f. Execute the test method in
DistSQLParserParameterizedTest. Now, the test comes to an end.
Finally, you can execute the developed DistSQL verification function by using command line tools.
DistSQL is one of the newest features released in version 5.0.0, and we will continue to improve it.
If you’re interested, you are welcome to develop the grammar system or provide awesome features to truly break the boundary between middleware and database.
SphereEx Middleware Development Engineer & Apache ShardingSphere Contributor. He focuses on designing and developing DistSQL.
by the author.