ART — Basic introduction
An introduction to the ART parser generator
For my summer UROP I created an eclipse plugin that worked with the ART parser generator, this post is a basic introduction to ART.
Unless you are using ART this will probably not be very helpful, but some of the basics should apply across similar systems.
Syntax
ART grammars have quite a complex syntax (Which itself is defined in ART), but at the most basic level it is similar to BNF.
Grammars look something like this:
a ::= b c
b ::= ‘hello’
c ::= ‘world’
This grammar parses the string
hello world
Terminals
There are 3 types of character terminals in ART.
- Single quoted strings
These were used in the previous example. They are case sensitive, so the terminal'hello'
matches the stringhello
, but not the stringHello
- Double quoted strings
These use double quotes around them, and they are not case sensitive - Backtick characters
These are used for defining terminals that would be difficult to define otherwise, it applies to the next character only.
It also has the effect of disabling whitespace managerment, so while
a ::= 'b' 'c'
will acceptb c
,a ::= `b `c
will not
Built-ins
Some elements of grammars are common, for example integer representations normally match the regex [0-9]+
.
Repeating this in grammars would be very repetitive, and also less expressive, so ART has some built-ins, all of which begin with an ampersand, the most commonly used ones of these are:
- &ID
Any valid Java identifier - &INTEGER
Any integer - &STRING_DQ
Any string enclosed in double quotes - &STRING_SQ
Any string enclosed in single quotes - &STRING_BRACE_NEST
Any string enclosed in braces
If you wanted a grammar that accepted Python style assignments of integers, you could define it like this:
statement ::= &ID '=' &INTEGER
which would accept
a = 8
Alternation and Repetition
Sometimes in a grammar you may want to have multiple options, an example of this is a simple array of booleans in JSON, where each element can be true
or false
, in ART this can be implemented as:
boolean ::= 'true' | 'false'
Repetition can be achieved by saying that an item contains something or itself and something
booleans ::= boolean | booleans boolean
boolean ::= 'true' | 'false'
This would accept
true false true false
Delimiters
In arrays you may want a delimiter, this can be added like so:
booleans ::= boolean | booleans ',' boolean
boolean ::= 'true' | 'false'
Optional items
To make an item optional you can alternate it with epsilon, written as #
:
optional_value ::= value | #
Comments
There are 2 forms of comments in ART:
- Double slash
// Single line comments
- Bracket star
(* Continues until closed *)