Creating Your Own Query Language with ANTLR for Elasticsearch Queries
In today’s world of data management and search, Elasticsearch is a powerful and flexible tool that handles large amounts of data and delivers fast search results. However, writing Elasticsearch queries, especially complex ones, can be difficult for users who don’t know its syntax. This is where creating your own query language helps.
By creating a custom query language that fits your needs, you can make it easier to write and run Elasticsearch queries. This custom language can be more intuitive and user-friendly, especially for non-technical users or those unfamiliar with Elasticsearch’s Query DSL. It simplifies the process, letting users focus on what they want to search for rather than how to write the query.
What is ANTLR?
From its official website
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
Disclaimer**
This blog post is not intended to teach ANTLR grammar or to show how to generate the most optimized Elasticsearch queries. It has been prepared solely for proof of concept (PoC) purposes. Therefore, the examples and the custom query structure shown are kept very simple.
Writing the Grammar: Defining Our Custom Query Language
In this section, we will introduce our custom query language syntax designed to simplify the process of constructing Elasticsearch queries. We’ll then walk you through the process of writing the ANTLR grammar for this language. By the end of this section, you’ll have a clear understanding of how to define your own query language using ANTLR.
Search Query
SEARCH FROM <index-name>
WHERE <conditions>
ORDER BY <field> [ASC|DESC]
LIMIT <number>
OFFSET <number>
Count Query
COUNT FROM <index-name>
WHERE <conditions>
Supported Conditions:
- Equality (
=
): Field must equal the specified value. - Inequality (
!=
): Field must not equal the specified value. - LIKE: Field must match the specified pattern.
- IN: Field must be one of the specified values.
- IS NULL: Field must be null.
Writing the ANTLR Grammar
Let’s define the ANTLR grammar for our custom query language. The grammar specifies the syntax rules and structure that our parser will recognize.
grammar Query;
query: searchClause whereClause? orderByClause? limitClause? offsetClause? # SearchQuery
| countClause whereClause? # CountQuery
;
searchClause: SEARCH FROM indexName ;
countClause: COUNT FROM indexName;
whereClause: WHERE condition;
condition: (andCondition | orCondition);
andCondition: conditionPart (AND conditionPart)*?;
orCondition: conditionPart (OR conditionPart)*?;
orderByClause: ORDER BY orderCondition ( COMMA orderCondition )* ;
limitClause: LIMIT NUMBER ;
offsetClause: OFFSET NUMBER ;
conditionPart: fieldName comparator value # ComparisonCondition
| LPAR condition RPAR # GroupCondition
| fieldName NOT? LIKE STRING # LikeCondition
| fieldName NOT? ISNULL # IsNullCondition
| fieldName NOT? IN LPAR value ( COMMA value )* RPAR # InCondition
;
orderCondition: fieldName (ASC | DESC)? ;
indexName: IDENTIFIER ;
fieldName: IDENTIFIER (DOT IDENTIFIER)* ;
value: NUMBER | STRING ;
// Lexer rules
SEARCH: 'SEARCH';
COUNT: 'COUNT';
FROM: 'FROM';
WHERE: 'WHERE';
ORDER: 'ORDER';
BY: 'BY';
LIMIT: 'LIMIT';
OFFSET: 'OFFSET';
AND: 'AND';
OR: 'OR';
NOT: 'NOT';
LIKE: 'LIKE';
IN: 'IN';
ISNULL: 'IS' 'NULL';
ASC: 'ASC';
DESC: 'DESC';
IDENTIFIER: [a-zA-Z_][a-zA-Z_0-9]* ;
NUMBER: [0-9]+ ;
STRING: '"' (~["\r\n] | '""')* '"' ;
WS: [ \t\r\n]+ -> skip ;
LPAR: '(';
RPAR: ')';
DOT: '.';
COMMA: ',';
// Comparison operators
comparator: EQUAL
| NOT_EQUAL
| GREATER
| GREATER_EQ
| LESSER
| LESSER_EQ
;
EQUAL: '=';
NOT_EQUAL: '!=';
GREATER: '>';
LESSER: '<';
GREATER_EQ: '>=';
LESSER_EQ: '<=';
Explanation of the Grammar
Entry Point (query
Rule):
The query
rule is the starting point of our grammar, distinguishing between SearchQuery
and CountQuery
.
Search and Count Clauses:
searchClause
and countClause
define the syntax for specifying the index to search or count from.
Where Clause:
The whereClause
specifies the conditions for filtering, using the condition
rule to handle logical expressions.
Condition Handling:
andCondition
and orCondition
handle logical AND and OR conditions.
conditionPart
supports various condition types, including comparisons, LIKE, IS NULL, and IN conditions.
Order, Limit, and Offset Clauses:
orderByClause
, limitClause
, and offsetClause
provide sorting, limiting, and offsetting functionalities for search results.
Tokens and Lexical Rules:
Keywords such as SEARCH
, COUNT
, FROM
, WHERE
, etc., are defined to recognize specific parts of the query.
IDENTIFIER
, NUMBER
, and STRING
rules handle basic data types.
Comparison operators (=
, !=
, >
, <
, >=
, <=
) are defined to support various comparison conditions.
This grammar forms the backbone of our custom query language, enabling users to write more intuitive and readable queries for Elasticsearch. In the next section, we will demonstrate how to use this grammar to generate and integrate a parser, and then convert these custom queries into Elasticsearch Query DSL.
Implementing the Grammar in Go: Project Bootstrap and Parsing Logic
In this section, we’ll guide you through setting up a Go project to parse our custom query language using ANTLR, and we’ll detail how to fill our predefined Go structs with the parsed query data. We’ll also discuss any potential optimizations.
Project Setup
First, ensure you have Go installed on your system. Then, follow these steps to bootstrap your project and install the necessary dependencies.
Create a new Go project:
$ mkdir custom-query-parser
$ cd custom-query-parser
$ go mod init custom-query-parser
Install ANTLR and Go dependencies:
- Install ANTLR: Follow the instructions on the official ANTLR page to download and set up ANTLR.
- Install ANTLR Go target:
$ go get github.com/antlr/antlr4/runtime/Go/antlr
Generate Go parser from ANTLR grammar:
- Save the ANTLR grammar provided earlier as
Query.g4
. - Run the following command to generate Go files:
$ antlr -Dlanguage=Go -o parser Query.g4
Create Go structs:
- Define the structs in a file,
models.go
:
package main
type ParsedQuery struct {
Type string `json:"type"`
Index string `json:"index"`
Condition Condition `json:"condition,omitempty"`
OrderBy []Order `json:"order_by,omitempty"`
Limit int `json:"limit,omitempty"`
Offset int `json:"offset,omitempty"`
}
type Condition struct {
Operator string
ConditionParts []ConditionPart
}
type ConditionPart struct {
Type string `json:"type"`
Field string `json:"field,omitempty"`
Operator string `json:"operator,omitempty"`
Value any `json:"value,omitempty"`
Condition *Condition `json:"condition,omitempty"`
Negate bool `json:"negate,omitempty"`
}
type Order struct {
Field string `json:"field"`
Direction string `json:"direction"`
}
These models are used as sub-model between our custom query and the Elasticsearch query. We will create a model data structure by visiting the Abstract Syntax Tree (AST) of the input written in our custom query language.
Parsing Logic
Here’s the provided parsing code, which we will place in a file called ast_visitor.go
. This code traverses the parse tree generated by ANTLR and populates our structs with the parsed query data.
package main
import (
"github.com/antlr4-go/antlr/v4"
"github.com/bakyazi/esgrammar/parser"
"strconv"
"strings"
)
func Visit(tree antlr.ParseTree) any {
if tree == nil {
return nil
}
switch node := tree.(type) {
case *parser.SearchQueryContext:
return visitSearchQuery(node)
case *parser.CountQueryContext:
return visitCountQuery(node)
case *parser.WhereClauseContext:
return visitWhereClause(node)
case *parser.ConditionContext:
return visitCondition(node)
case *parser.AndConditionContext:
return visitAndCondition(node)
case *parser.OrConditionContext:
return visitOrCondition(node)
case *parser.ComparisonConditionContext:
return visitComparisonCondition(node)
case *parser.GroupConditionContext:
return visitGroupCondition(node)
case *parser.LikeConditionContext:
return visitLikeCondition(node)
case *parser.IsNullConditionContext:
return visitIsNullCondition(node)
case *parser.InConditionContext:
return visitInCondition(node)
case *parser.OrderByClauseContext:
return visitOrderByClause(node)
case *parser.OrderConditionContext:
return visitOrderCondition(node)
case *parser.LimitClauseContext:
return visitLimitClause(node)
case *parser.OffsetClauseContext:
return visitOffsetClause(node)
case *parser.ValueContext:
return visitValue(node)
}
return nil
}
func visitSearchQuery(node *parser.SearchQueryContext) ParsedQuery {
parsedQuery := ParsedQuery{
Type: "search",
Index: node.SearchClause().IndexName().GetText(),
}
if node.WhereClause() != nil {
parsedQuery.Condition = Visit(node.WhereClause()).(Condition)
}
if node.OrderByClause() != nil {
parsedQuery.OrderBy = Visit(node.OrderByClause()).([]Order)
}
if node.LimitClause() != nil {
parsedQuery.Limit = Visit(node.LimitClause()).(int)
}
if node.OffsetClause() != nil {
parsedQuery.Offset = Visit(node.OffsetClause()).(int)
}
return parsedQuery
}
func visitCountQuery(node *parser.CountQueryContext) ParsedQuery {
parsedQuery := ParsedQuery{
Type: "count",
Index: node.CountClause().IndexName().GetText(),
}
if node.WhereClause() != nil {
parsedQuery.Condition = Visit(node.WhereClause()).(Condition)
}
return parsedQuery
}
func visitWhereClause(node *parser.WhereClauseContext) Condition {
return Visit(node.Condition()).(Condition)
}
func visitCondition(node *parser.ConditionContext) any {
if node.OrCondition() != nil {
return Visit(node.OrCondition())
}
return Visit(node.AndCondition())
}
func visitAndCondition(node *parser.AndConditionContext) Condition {
condition := Condition{Operator: "and"}
for _, cp := range node.AllConditionPart() {
condition.ConditionParts = append(condition.ConditionParts, Visit(cp).(ConditionPart))
}
return condition
}
func visitOrCondition(node *parser.OrConditionContext) Condition {
condition := Condition{Operator: "or"}
for _, cp := range node.AllConditionPart() {
condition.ConditionParts = append(condition.ConditionParts, Visit(cp).(ConditionPart))
}
return condition
}
func visitComparisonCondition(node *parser.ComparisonConditionContext) ConditionPart {
return ConditionPart{
Type: "ComparisonCondition",
Field: node.FieldName().GetText(),
Operator: node.Comparator().GetText(),
Value: Visit(node.Value()),
Negate: node.Comparator().GetText() == "!=",
}
}
func visitGroupCondition(node *parser.GroupConditionContext) ConditionPart {
condition := Visit(node.Condition()).(Condition)
return ConditionPart{
Type: "GroupCondition",
Condition: &condition,
}
}
func visitLikeCondition(node *parser.LikeConditionContext) ConditionPart {
return ConditionPart{
Type: "LikeCondition",
Field: node.FieldName().GetText(),
Value: strings.Trim(node.STRING().GetText(), `"`),
Negate: node.NOT() != nil,
}
}
func visitIsNullCondition(node *parser.IsNullConditionContext) ConditionPart {
return ConditionPart{
Type: "IsNullCondition",
Field: node.FieldName().GetText(),
Negate: node.NOT() != nil,
}
}
func visitInCondition(node *parser.InConditionContext) ConditionPart {
var values []any
for _, value := range node.AllValue() {
values = append(values, Visit(value))
}
return ConditionPart{
Type: "InCondition",
Field: node.FieldName().GetText(),
Value: values,
Negate: node.NOT() != nil,
}
}
func visitOrderByClause(node *parser.OrderByClauseContext) []Order {
var orders []Order
for _, orderCondition := range node.AllOrderCondition() {
orders = append(orders, Visit(orderCondition).(Order))
}
return orders
}
func visitOrderCondition(node *parser.OrderConditionContext) Order {
order := Order{
Field: node.FieldName().GetText(),
Direction: "ASC",
}
if node.DESC() != nil {
order.Direction = "DESC"
}
return order
}
func visitLimitClause(node *parser.LimitClauseContext) int {
return getIntValue(node.NUMBER())
}
func visitOffsetClause(node *parser.OffsetClauseContext) int {
return getIntValue(node.NUMBER())
}
func visitValue(node *parser.ValueContext) any {
if node.STRING() != nil {
return strings.Trim(node.STRING().GetText(), `"`)
}
return getIntValue(node.NUMBER())
}
func getIntValue(node antlr.TerminalNode) int {
value, err := strconv.Atoi(node.GetText())
if err != nil {
return 0
}
return value
}
Converting Parsed Queries to Elasticsearch Queries
In this section, we’ll detail how to convert the parsed queries from our custom query language into Elasticsearch queries. This process involves translating the structured data in our ParsedQuery
struct into the corresponding Elasticsearch Query DSL.
Struct Definition for Elasticsearch Query
We define an EsQuery
struct to represent the Elasticsearch query:
// Constants for query keys
const (
QueryMust = "must"
QueryMustNot = "must_not"
QueryShould = "should"
QueryFilter = "filter"
QueryBool = "bool"
QueryTerm = "term"
QueryRange = "range"
QueryExists = "exists"
QueryWildcard = "wildcard"
QueryTerms = "terms"
)
type EsQuery struct {
Index string
Command string
Query map[string]any
}
func (eq EsQuery) String() string {
var sb strings.Builder
path := "_search"
if eq.Command == "count" {
path = "_count"
}
sb.WriteString(fmt.Sprintf("GET /%s/%s\n", eq.Index, path))
q, _ := json.MarshalIndent(eq.Query, "", " ")
sb.WriteString(string(q))
return sb.String()
}
- Index: The index to query.
- Command: The type of operation (either
search
orcount
). - Query: The actual Elasticsearch query represented as a map.
The String
method generates a string representation of the Elasticsearch query, useful for debugging or displaying the query.
Conversion Function
The convertToElasticsearch
function converts a ParsedQuery
into an EsQuery
:
func convertToElasticsearch(parsedQuery ParsedQuery) (EsQuery, error) {
esQuery := EsQuery{
Index: parsedQuery.Index,
Command: parsedQuery.Type,
Query: map[string]any{},
}
esQuery.Query = map[string]any{
"query": map[string]any{
QueryBool: buildBoolQuery(parsedQuery.Condition),
},
}
if parsedQuery.Type == "search" {
esQuery.Query["size"] = parsedQuery.Limit
esQuery.Query["from"] = parsedQuery.Offset
if len(parsedQuery.OrderBy) > 0 {
esQuery.Query["sort"] = buildSortQuery(parsedQuery.OrderBy)
}
}
return esQuery, nil
}
- The function initializes an
EsQuery
with the index and command from theParsedQuery
. - Depending on the query type (
search
orcount
), it constructs the appropriate Elasticsearch query structure.
Building Boolean Queries
The buildBoolQuery
function constructs the boolean part of the Elasticsearch query:
func buildBoolQuery(condition Condition) map[string][]any {
boolQuery := map[string][]any{
QueryMust: {},
QueryMustNot: {},
QueryShould: {},
QueryFilter: {},
}
for _, conditionPart := range condition.ConditionParts {
conditionPartQuery := buildConditionPartQuery(conditionPart)
addConditionToBoolQuery(boolQuery, condition, conditionPart, conditionPartQuery)
}
// Remove empty keys
for k, v := range boolQuery {
if len(v) == 0 {
delete(boolQuery, k)
}
}
return boolQuery
}
func addConditionToBoolQuery(boolQuery map[string][]any, condition Condition, conditionPart ConditionPart, conditionPartQuery map[string]any) {
if condition.Operator == "and" {
if conditionPart.Negate {
boolQuery[QueryMustNot] = append(boolQuery[QueryMustNot], conditionPartQuery)
} else {
boolQuery[QueryMust] = append(boolQuery[QueryMust], conditionPartQuery)
}
} else {
if conditionPart.Negate {
encapsulated := map[string]any{
QueryBool: map[string]any{
QueryMustNot: []any{conditionPartQuery},
},
}
boolQuery[QueryShould] = append(boolQuery[QueryShould], encapsulated)
} else {
boolQuery[QueryShould] = append(boolQuery[QueryShould], conditionPartQuery)
}
}
}
- boolQuery: The resulting boolean query structure with
must
,must_not
,should
, andfilter
clauses. - Depending on the condition’s operator (
and
oror
), the function appropriately assigns conditions tomust
,must_not
, orshould
.
Building Condition Part Queries
The buildConditionPartQuery
function translates individual condition parts into Elasticsearch query components:
func buildConditionPartQuery(conditionPart ConditionPart) map[string]any {
switch conditionPart.Type {
case "ComparisonCondition":
return buildComparisonQuery(conditionPart)
case "GroupCondition":
return map[string]any{QueryBool: buildBoolQuery(*conditionPart.Condition)}
case "LikeCondition":
return buildLikeQuery(conditionPart)
case "IsNullCondition":
return buildIsNullQuery(conditionPart)
case "InCondition":
return buildInQuery(conditionPart)
default:
return nil
}
}
- Depending on the type of condition part, this function delegates to specific helper functions to build the appropriate query structure.
Building Specific Queries
Comparison Query:
func buildComparisonQuery(conditionPart ConditionPart) map[string]any {
if conditionPart.Operator == "=" || conditionPart.Operator == "!=" {
return map[string]any{
QueryTerm: map[string]any{
conditionPart.Field: conditionPart.Value,
},
}
}
return map[string]any{
QueryRange: map[string]any{
conditionPart.Field: map[string]any{
rangeOperator(conditionPart.Operator): conditionPart.Value,
},
},
}
}
func rangeOperator(operator string) string {
switch operator {
case ">":
return "gt"
case ">=":
return "gte"
case "<":
return "lt"
case "<=":
return "lte"
default:
return ""
}
}
- Constructs either a
term
query for equality/inequality or arange
query for comparison operators.
Like Query:
func buildLikeQuery(conditionPart ConditionPart) map[string]any {
value := strings.Replace(conditionPart.Value.(string), "%", "*", -1)
return map[string]any{
QueryWildcard: map[string]any{
conditionPart.Field: value,
},
}
}
- Replaces SQL-like
%
wildcards with Elasticsearch's*
wildcards and constructs awildcard
query.
Is Null Query:
func buildIsNullQuery(conditionPart ConditionPart) map[string]any {
return map[string]any{
QueryExists: map[string]any{
"field": conditionPart.Field,
},
}
}
- Constructs an
exists
query to check if a field is null.
In Query:
func buildInQuery(conditionPart ConditionPart) map[string]any {
return map[string]any{
QueryTerms: map[string]any{
conditionPart.Field: conditionPart.Value,
},
}
}
- Constructs a
terms
query to check if a field's value is within a list of values.
Building Sort Queries
The buildSortQuery
function constructs the sort part of the query:
func buildSortQuery(orderBy []Order) []any {
var sortQuery []any
for _, order := range orderBy {
sortQuery = append(sortQuery, map[string]any{
order.Field: map[string]any{
"order": strings.ToLower(order.Direction),
},
})
}
return sortQuery
}
- Iterates through the
OrderBy
slice and constructs the sort criteria.
Explanation of Complicated Sections
- Boolean Query Construction: The
buildBoolQuery
function handles the complexity of combining different conditions with logical operators (AND
,OR
) and their negations. Ensuring that the resulting query structure is valid and correctly reflects the logical operations can be challenging. Pay special attention to how conditions are added to themust
,must_not
, andshould
arrays. - Condition Part Handling: The
buildConditionPartQuery
function uses a switch case to delegate the construction of different types of conditions to specific helper functions. This separation of concerns simplifies the main query construction logic but requires careful handling to ensure each condition type is correctly processed.
Enhancing Your Custom Query Language
Now that you have a basic custom query language for Elasticsearch, there are several ways you can expand and improve it to handle more complex queries.
Implement Group By (Aggregation) Syntax
You can add support for aggregation queries by implementing a GROUP BY
syntax in your grammar. This will allow users to perform group-by operations similar to SQL.
Add Join Syntax for Multi-Index Queries
To enable queries that involve multiple indexes, you can implement a JOIN
syntax. This would allow users to specify joins between different indexes, facilitating more complex search scenarios.
Handle More Complex Conditions
Extend your grammar to support more advanced condition types and logical operators, such as nested conditions or additional comparison operators, to provide users with greater flexibility in query construction.
For the complete code and further examples, please visit the GitHub repository.