Creating Your Own Query Language with ANTLR for Elasticsearch Queries

Berkay Akyazı
Picus Security Engineering
10 min readMay 28, 2024

In today’s world of data management and search, Elasticsearch is a powerful and flexible tool that handles large amounts of data and delivers fast search results. However, writing Elasticsearch queries, especially complex ones, can be difficult for users who don’t know its syntax. This is where creating your own query language helps.

By creating a custom query language that fits your needs, you can make it easier to write and run Elasticsearch queries. This custom language can be more intuitive and user-friendly, especially for non-technical users or those unfamiliar with Elasticsearch’s Query DSL. It simplifies the process, letting users focus on what they want to search for rather than how to write the query.

What is ANTLR?

From its official website

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

Disclaimer**

This blog post is not intended to teach ANTLR grammar or to show how to generate the most optimized Elasticsearch queries. It has been prepared solely for proof of concept (PoC) purposes. Therefore, the examples and the custom query structure shown are kept very simple.

Writing the Grammar: Defining Our Custom Query Language

In this section, we will introduce our custom query language syntax designed to simplify the process of constructing Elasticsearch queries. We’ll then walk you through the process of writing the ANTLR grammar for this language. By the end of this section, you’ll have a clear understanding of how to define your own query language using ANTLR.

Search Query

SEARCH FROM <index-name> 
WHERE <conditions>
ORDER BY <field> [ASC|DESC]
LIMIT <number>
OFFSET <number>

Count Query

COUNT FROM <index-name> 
WHERE <conditions>

Supported Conditions:

  • Equality (=): Field must equal the specified value.
  • Inequality (!=): Field must not equal the specified value.
  • LIKE: Field must match the specified pattern.
  • IN: Field must be one of the specified values.
  • IS NULL: Field must be null.

Writing the ANTLR Grammar

Let’s define the ANTLR grammar for our custom query language. The grammar specifies the syntax rules and structure that our parser will recognize.

grammar Query;

query: searchClause whereClause? orderByClause? limitClause? offsetClause? # SearchQuery
| countClause whereClause? # CountQuery
;

searchClause: SEARCH FROM indexName ;
countClause: COUNT FROM indexName;
whereClause: WHERE condition;

condition: (andCondition | orCondition);
andCondition: conditionPart (AND conditionPart)*?;
orCondition: conditionPart (OR conditionPart)*?;

orderByClause: ORDER BY orderCondition ( COMMA orderCondition )* ;
limitClause: LIMIT NUMBER ;
offsetClause: OFFSET NUMBER ;

conditionPart: fieldName comparator value # ComparisonCondition
| LPAR condition RPAR # GroupCondition
| fieldName NOT? LIKE STRING # LikeCondition
| fieldName NOT? ISNULL # IsNullCondition
| fieldName NOT? IN LPAR value ( COMMA value )* RPAR # InCondition
;

orderCondition: fieldName (ASC | DESC)? ;

indexName: IDENTIFIER ;
fieldName: IDENTIFIER (DOT IDENTIFIER)* ;
value: NUMBER | STRING ;

// Lexer rules
SEARCH: 'SEARCH';
COUNT: 'COUNT';
FROM: 'FROM';
WHERE: 'WHERE';
ORDER: 'ORDER';
BY: 'BY';
LIMIT: 'LIMIT';
OFFSET: 'OFFSET';
AND: 'AND';
OR: 'OR';
NOT: 'NOT';
LIKE: 'LIKE';
IN: 'IN';
ISNULL: 'IS' 'NULL';
ASC: 'ASC';
DESC: 'DESC';

IDENTIFIER: [a-zA-Z_][a-zA-Z_0-9]* ;
NUMBER: [0-9]+ ;
STRING: '"' (~["\r\n] | '""')* '"' ;
WS: [ \t\r\n]+ -> skip ;
LPAR: '(';
RPAR: ')';
DOT: '.';
COMMA: ',';

// Comparison operators
comparator: EQUAL
| NOT_EQUAL
| GREATER
| GREATER_EQ
| LESSER
| LESSER_EQ
;

EQUAL: '=';
NOT_EQUAL: '!=';
GREATER: '>';
LESSER: '<';
GREATER_EQ: '>=';
LESSER_EQ: '<=';

Explanation of the Grammar

Entry Point (query Rule):

The query rule is the starting point of our grammar, distinguishing between SearchQuery and CountQuery.

Search and Count Clauses:

searchClause and countClause define the syntax for specifying the index to search or count from.

Where Clause:

The whereClause specifies the conditions for filtering, using the condition rule to handle logical expressions.

Condition Handling:

andCondition and orCondition handle logical AND and OR conditions.

conditionPart supports various condition types, including comparisons, LIKE, IS NULL, and IN conditions.

Order, Limit, and Offset Clauses:

orderByClause, limitClause, and offsetClause provide sorting, limiting, and offsetting functionalities for search results.

Tokens and Lexical Rules:

Keywords such as SEARCH, COUNT, FROM, WHERE, etc., are defined to recognize specific parts of the query.

IDENTIFIER, NUMBER, and STRING rules handle basic data types.

Comparison operators (=, !=, >, <, >=, <=) are defined to support various comparison conditions.

This grammar forms the backbone of our custom query language, enabling users to write more intuitive and readable queries for Elasticsearch. In the next section, we will demonstrate how to use this grammar to generate and integrate a parser, and then convert these custom queries into Elasticsearch Query DSL.

Implementing the Grammar in Go: Project Bootstrap and Parsing Logic

In this section, we’ll guide you through setting up a Go project to parse our custom query language using ANTLR, and we’ll detail how to fill our predefined Go structs with the parsed query data. We’ll also discuss any potential optimizations.

Project Setup

First, ensure you have Go installed on your system. Then, follow these steps to bootstrap your project and install the necessary dependencies.

Create a new Go project:

$ mkdir custom-query-parser
$ cd custom-query-parser
$ go mod init custom-query-parser

Install ANTLR and Go dependencies:

  • Install ANTLR: Follow the instructions on the official ANTLR page to download and set up ANTLR.
  • Install ANTLR Go target:
$ go get github.com/antlr/antlr4/runtime/Go/antlr

Generate Go parser from ANTLR grammar:

  • Save the ANTLR grammar provided earlier as Query.g4.
  • Run the following command to generate Go files:
$ antlr -Dlanguage=Go -o parser Query.g4

Create Go structs:

  • Define the structs in a file, models.go:
package main

type ParsedQuery struct {
Type string `json:"type"`
Index string `json:"index"`
Condition Condition `json:"condition,omitempty"`
OrderBy []Order `json:"order_by,omitempty"`
Limit int `json:"limit,omitempty"`
Offset int `json:"offset,omitempty"`
}

type Condition struct {
Operator string
ConditionParts []ConditionPart
}

type ConditionPart struct {
Type string `json:"type"`
Field string `json:"field,omitempty"`
Operator string `json:"operator,omitempty"`
Value any `json:"value,omitempty"`
Condition *Condition `json:"condition,omitempty"`
Negate bool `json:"negate,omitempty"`
}

type Order struct {
Field string `json:"field"`
Direction string `json:"direction"`
}

These models are used as sub-model between our custom query and the Elasticsearch query. We will create a model data structure by visiting the Abstract Syntax Tree (AST) of the input written in our custom query language.

Parsing Logic

Here’s the provided parsing code, which we will place in a file called ast_visitor.go. This code traverses the parse tree generated by ANTLR and populates our structs with the parsed query data.

package main

import (
"github.com/antlr4-go/antlr/v4"
"github.com/bakyazi/esgrammar/parser"
"strconv"
"strings"
)

func Visit(tree antlr.ParseTree) any {
if tree == nil {
return nil
}

switch node := tree.(type) {
case *parser.SearchQueryContext:
return visitSearchQuery(node)
case *parser.CountQueryContext:
return visitCountQuery(node)
case *parser.WhereClauseContext:
return visitWhereClause(node)
case *parser.ConditionContext:
return visitCondition(node)
case *parser.AndConditionContext:
return visitAndCondition(node)
case *parser.OrConditionContext:
return visitOrCondition(node)
case *parser.ComparisonConditionContext:
return visitComparisonCondition(node)
case *parser.GroupConditionContext:
return visitGroupCondition(node)
case *parser.LikeConditionContext:
return visitLikeCondition(node)
case *parser.IsNullConditionContext:
return visitIsNullCondition(node)
case *parser.InConditionContext:
return visitInCondition(node)
case *parser.OrderByClauseContext:
return visitOrderByClause(node)
case *parser.OrderConditionContext:
return visitOrderCondition(node)
case *parser.LimitClauseContext:
return visitLimitClause(node)
case *parser.OffsetClauseContext:
return visitOffsetClause(node)
case *parser.ValueContext:
return visitValue(node)
}
return nil
}

func visitSearchQuery(node *parser.SearchQueryContext) ParsedQuery {
parsedQuery := ParsedQuery{
Type: "search",
Index: node.SearchClause().IndexName().GetText(),
}
if node.WhereClause() != nil {
parsedQuery.Condition = Visit(node.WhereClause()).(Condition)
}
if node.OrderByClause() != nil {
parsedQuery.OrderBy = Visit(node.OrderByClause()).([]Order)
}
if node.LimitClause() != nil {
parsedQuery.Limit = Visit(node.LimitClause()).(int)
}
if node.OffsetClause() != nil {
parsedQuery.Offset = Visit(node.OffsetClause()).(int)
}
return parsedQuery
}

func visitCountQuery(node *parser.CountQueryContext) ParsedQuery {
parsedQuery := ParsedQuery{
Type: "count",
Index: node.CountClause().IndexName().GetText(),
}
if node.WhereClause() != nil {
parsedQuery.Condition = Visit(node.WhereClause()).(Condition)
}
return parsedQuery
}

func visitWhereClause(node *parser.WhereClauseContext) Condition {
return Visit(node.Condition()).(Condition)
}

func visitCondition(node *parser.ConditionContext) any {
if node.OrCondition() != nil {
return Visit(node.OrCondition())
}
return Visit(node.AndCondition())
}

func visitAndCondition(node *parser.AndConditionContext) Condition {
condition := Condition{Operator: "and"}
for _, cp := range node.AllConditionPart() {
condition.ConditionParts = append(condition.ConditionParts, Visit(cp).(ConditionPart))
}
return condition
}

func visitOrCondition(node *parser.OrConditionContext) Condition {
condition := Condition{Operator: "or"}
for _, cp := range node.AllConditionPart() {
condition.ConditionParts = append(condition.ConditionParts, Visit(cp).(ConditionPart))
}
return condition
}

func visitComparisonCondition(node *parser.ComparisonConditionContext) ConditionPart {
return ConditionPart{
Type: "ComparisonCondition",
Field: node.FieldName().GetText(),
Operator: node.Comparator().GetText(),
Value: Visit(node.Value()),
Negate: node.Comparator().GetText() == "!=",
}
}

func visitGroupCondition(node *parser.GroupConditionContext) ConditionPart {
condition := Visit(node.Condition()).(Condition)
return ConditionPart{
Type: "GroupCondition",
Condition: &condition,
}
}

func visitLikeCondition(node *parser.LikeConditionContext) ConditionPart {
return ConditionPart{
Type: "LikeCondition",
Field: node.FieldName().GetText(),
Value: strings.Trim(node.STRING().GetText(), `"`),
Negate: node.NOT() != nil,
}
}

func visitIsNullCondition(node *parser.IsNullConditionContext) ConditionPart {
return ConditionPart{
Type: "IsNullCondition",
Field: node.FieldName().GetText(),
Negate: node.NOT() != nil,
}
}

func visitInCondition(node *parser.InConditionContext) ConditionPart {
var values []any
for _, value := range node.AllValue() {
values = append(values, Visit(value))
}
return ConditionPart{
Type: "InCondition",
Field: node.FieldName().GetText(),
Value: values,
Negate: node.NOT() != nil,
}
}

func visitOrderByClause(node *parser.OrderByClauseContext) []Order {
var orders []Order
for _, orderCondition := range node.AllOrderCondition() {
orders = append(orders, Visit(orderCondition).(Order))
}
return orders
}

func visitOrderCondition(node *parser.OrderConditionContext) Order {
order := Order{
Field: node.FieldName().GetText(),
Direction: "ASC",
}
if node.DESC() != nil {
order.Direction = "DESC"
}
return order
}

func visitLimitClause(node *parser.LimitClauseContext) int {
return getIntValue(node.NUMBER())
}

func visitOffsetClause(node *parser.OffsetClauseContext) int {
return getIntValue(node.NUMBER())
}

func visitValue(node *parser.ValueContext) any {
if node.STRING() != nil {
return strings.Trim(node.STRING().GetText(), `"`)
}
return getIntValue(node.NUMBER())
}

func getIntValue(node antlr.TerminalNode) int {
value, err := strconv.Atoi(node.GetText())
if err != nil {
return 0
}
return value
}

Converting Parsed Queries to Elasticsearch Queries

In this section, we’ll detail how to convert the parsed queries from our custom query language into Elasticsearch queries. This process involves translating the structured data in our ParsedQuery struct into the corresponding Elasticsearch Query DSL.

Struct Definition for Elasticsearch Query

We define an EsQuery struct to represent the Elasticsearch query:

// Constants for query keys
const (
QueryMust = "must"
QueryMustNot = "must_not"
QueryShould = "should"
QueryFilter = "filter"
QueryBool = "bool"
QueryTerm = "term"
QueryRange = "range"
QueryExists = "exists"
QueryWildcard = "wildcard"
QueryTerms = "terms"
)

type EsQuery struct {
Index string
Command string
Query map[string]any
}

func (eq EsQuery) String() string {
var sb strings.Builder
path := "_search"
if eq.Command == "count" {
path = "_count"
}
sb.WriteString(fmt.Sprintf("GET /%s/%s\n", eq.Index, path))
q, _ := json.MarshalIndent(eq.Query, "", " ")
sb.WriteString(string(q))
return sb.String()
}
  • Index: The index to query.
  • Command: The type of operation (either search or count).
  • Query: The actual Elasticsearch query represented as a map.

The String method generates a string representation of the Elasticsearch query, useful for debugging or displaying the query.

Conversion Function

The convertToElasticsearch function converts a ParsedQuery into an EsQuery:

func convertToElasticsearch(parsedQuery ParsedQuery) (EsQuery, error) {
esQuery := EsQuery{
Index: parsedQuery.Index,
Command: parsedQuery.Type,
Query: map[string]any{},
}

esQuery.Query = map[string]any{
"query": map[string]any{
QueryBool: buildBoolQuery(parsedQuery.Condition),
},
}

if parsedQuery.Type == "search" {
esQuery.Query["size"] = parsedQuery.Limit
esQuery.Query["from"] = parsedQuery.Offset
if len(parsedQuery.OrderBy) > 0 {
esQuery.Query["sort"] = buildSortQuery(parsedQuery.OrderBy)
}
}
return esQuery, nil
}
  • The function initializes an EsQuery with the index and command from the ParsedQuery.
  • Depending on the query type (search or count), it constructs the appropriate Elasticsearch query structure.

Building Boolean Queries

The buildBoolQuery function constructs the boolean part of the Elasticsearch query:

func buildBoolQuery(condition Condition) map[string][]any {
boolQuery := map[string][]any{
QueryMust: {},
QueryMustNot: {},
QueryShould: {},
QueryFilter: {},
}

for _, conditionPart := range condition.ConditionParts {
conditionPartQuery := buildConditionPartQuery(conditionPart)
addConditionToBoolQuery(boolQuery, condition, conditionPart, conditionPartQuery)
}

// Remove empty keys
for k, v := range boolQuery {
if len(v) == 0 {
delete(boolQuery, k)
}
}
return boolQuery
}

func addConditionToBoolQuery(boolQuery map[string][]any, condition Condition, conditionPart ConditionPart, conditionPartQuery map[string]any) {
if condition.Operator == "and" {
if conditionPart.Negate {
boolQuery[QueryMustNot] = append(boolQuery[QueryMustNot], conditionPartQuery)
} else {
boolQuery[QueryMust] = append(boolQuery[QueryMust], conditionPartQuery)
}
} else {
if conditionPart.Negate {
encapsulated := map[string]any{
QueryBool: map[string]any{
QueryMustNot: []any{conditionPartQuery},
},
}
boolQuery[QueryShould] = append(boolQuery[QueryShould], encapsulated)
} else {
boolQuery[QueryShould] = append(boolQuery[QueryShould], conditionPartQuery)
}
}
}
  • boolQuery: The resulting boolean query structure with must, must_not, should, and filter clauses.
  • Depending on the condition’s operator (and or or), the function appropriately assigns conditions to must, must_not, or should.

Building Condition Part Queries

The buildConditionPartQuery function translates individual condition parts into Elasticsearch query components:

func buildConditionPartQuery(conditionPart ConditionPart) map[string]any {
switch conditionPart.Type {
case "ComparisonCondition":
return buildComparisonQuery(conditionPart)
case "GroupCondition":
return map[string]any{QueryBool: buildBoolQuery(*conditionPart.Condition)}
case "LikeCondition":
return buildLikeQuery(conditionPart)
case "IsNullCondition":
return buildIsNullQuery(conditionPart)
case "InCondition":
return buildInQuery(conditionPart)
default:
return nil
}
}
  • Depending on the type of condition part, this function delegates to specific helper functions to build the appropriate query structure.

Building Specific Queries

Comparison Query:

func buildComparisonQuery(conditionPart ConditionPart) map[string]any {
if conditionPart.Operator == "=" || conditionPart.Operator == "!=" {
return map[string]any{
QueryTerm: map[string]any{
conditionPart.Field: conditionPart.Value,
},
}
}
return map[string]any{
QueryRange: map[string]any{
conditionPart.Field: map[string]any{
rangeOperator(conditionPart.Operator): conditionPart.Value,
},
},
}
}

func rangeOperator(operator string) string {
switch operator {
case ">":
return "gt"
case ">=":
return "gte"
case "<":
return "lt"
case "<=":
return "lte"
default:
return ""
}
}
  • Constructs either a term query for equality/inequality or a range query for comparison operators.

Like Query:

func buildLikeQuery(conditionPart ConditionPart) map[string]any {
value := strings.Replace(conditionPart.Value.(string), "%", "*", -1)
return map[string]any{
QueryWildcard: map[string]any{
conditionPart.Field: value,
},
}
}
  • Replaces SQL-like % wildcards with Elasticsearch's * wildcards and constructs a wildcard query.

Is Null Query:

func buildIsNullQuery(conditionPart ConditionPart) map[string]any {
return map[string]any{
QueryExists: map[string]any{
"field": conditionPart.Field,
},
}
}
  • Constructs an exists query to check if a field is null.

In Query:

func buildInQuery(conditionPart ConditionPart) map[string]any {
return map[string]any{
QueryTerms: map[string]any{
conditionPart.Field: conditionPart.Value,
},
}
}
  • Constructs a terms query to check if a field's value is within a list of values.

Building Sort Queries

The buildSortQuery function constructs the sort part of the query:

func buildSortQuery(orderBy []Order) []any {
var sortQuery []any
for _, order := range orderBy {
sortQuery = append(sortQuery, map[string]any{
order.Field: map[string]any{
"order": strings.ToLower(order.Direction),
},
})
}
return sortQuery
}
  • Iterates through the OrderBy slice and constructs the sort criteria.

Explanation of Complicated Sections

  • Boolean Query Construction: The buildBoolQuery function handles the complexity of combining different conditions with logical operators (AND, OR) and their negations. Ensuring that the resulting query structure is valid and correctly reflects the logical operations can be challenging. Pay special attention to how conditions are added to the must, must_not, and should arrays.
  • Condition Part Handling: The buildConditionPartQuery function uses a switch case to delegate the construction of different types of conditions to specific helper functions. This separation of concerns simplifies the main query construction logic but requires careful handling to ensure each condition type is correctly processed.

Enhancing Your Custom Query Language

Now that you have a basic custom query language for Elasticsearch, there are several ways you can expand and improve it to handle more complex queries.

Implement Group By (Aggregation) Syntax

You can add support for aggregation queries by implementing a GROUP BY syntax in your grammar. This will allow users to perform group-by operations similar to SQL.

Add Join Syntax for Multi-Index Queries

To enable queries that involve multiple indexes, you can implement a JOIN syntax. This would allow users to specify joins between different indexes, facilitating more complex search scenarios.

Handle More Complex Conditions

Extend your grammar to support more advanced condition types and logical operators, such as nested conditions or additional comparison operators, to provide users with greater flexibility in query construction.

For the complete code and further examples, please visit the GitHub repository.

Demo

--

--