A TYON implementation in Dart written by ChatGPT

(It cannot deal with typed TYON values)

Stefan Matthias Aust
ICNH
8 min readAug 10, 2023

--

Photo by Ferenc Almasi on Unsplash

Here is a ChatGPT experiment that might even result in something useful.

I did this experiment on the 15th of April 2023, so the current version of ChatGPT 4 may have a different behaviour or may already be out of date.

I recently saw a mention on HackerNews of a JSON-like data format called TYON and thought that a parser written in Dart might be quite useful. It shouldn’t be too difficult to write by hand, but why not let ChatGPT 4 do the job?

Basically, in addition to JSON, TYON allows for Lisp-style comments, doesn’t require " for its values and starts with an implicit object so you need no top level {}. It also supports type definition that can be used to build objects from just data.

Here’s my prompt:

Please write a parser in Dart for a file format according the specification below. Parsing a string of TYON should return a Map<String, dynamic> where dynamic is really either a String, another of such maps or a List<dynamic> where each element is again either a string, a map or a list.

Then I appended the whole specification to my prompt and pressed RETURN and waited while our future overlord slowly emitted its wisdom.

ChatGTP starts with a school-book solution using a lexer and a parser:

Map<String, dynamic> parseTYON(String input) {
final lexer = _TYONLexer(input);
final parser = _TYONParser(lexer);
return parser.parse();
}

The lexer is also very classical in that it stores the _input and maintains an _index of a current character (_peek) which can be consumed (_consume):

class _TYONLexer {
static const _specialChars = '()[]=/;';

String _input;
int _index;

_TYONLexer(String input)
: _input = input,
_index = 0;

String _consume() {
if (_index < _input.length) {
return _input[_index++];
}
return null;
}

String _peek() {
if (_index < _input.length) {
return _input[_index];
}
return null;
}

...

Note, that ChatGPT doesn’t know about Dart’s null safety, so the _peek and _consume methods should be of type String?, but that's easy to fix. And _input should probably be final.

A nextToken method will return the next token from the input (again, it should have the return type String?). As you see, it will skip whitespace and return null if the end of the input has been reached. Otherwise it will skip comments or either return a syntax token, a quoted string or a literal:

  String nextToken() {
String char = _peek();
while (char != null && _isWhitespace(char)) {
_consume();
char = _peek();
}

if (char == null) {
return null;
}

if (char == ';') {
while (char != null && char != '\n') {
_consume();
char = _peek();
}
return nextToken();
}

if (_specialChars.contains(char)) {
_consume();
return char;
}

if (char == '"') {
_consume();
return _readQuotedString();
}

return _readLiteral();
}

The _isWhitespace method does the obvious thing. Hence, I omitted the code.

The _readQuotedString method, however, doesn't follow the specification which says that a " must be escaped as "". Instead, it implements the usual \ quoting. That's something I will fix later.

A literal, according to the specification, cannot start with " or /, but can contain it. This is something ChatGTP also overlooked in _readLiteral. I didn't bother.

The parser declares two methods to interact with the lexer, but only ever uses _nextToken (again, we need to add ? for null safety and a final to make the linter happy):

class _TYONParser {
_TYONLexer _lexer;

_TYONParser(this._lexer);

String _peekToken() => _lexer._peek();

String _nextToken() => _lexer.nextToken();

...

Let’s have a look at parse which drives the whole process:

  Map<String, dynamic> parse() {
Map<String, dynamic> result = HashMap<String, dynamic>();
String? token = _nextToken();

while (token != null) {
if (token == '(' || token == '[') {
throw FormatException('Unexpected opening bracket');
}

String key = token;
token = _nextToken();

if (token == '=') {
token = _nextToken();

if (token == '(' || token == '[') {
result[key] = _parseValue(token);
} else {
result[key] = token;
}
} else {
throw FormatException('Expected "=" after key');
}

token = _nextToken();
}

return result;
}

It will read as many tokens as it finds. It correctly knows that a ( or [ would be illegal while expecting a property name. Actually any syntax character is invalid at this point, not just those two. It simply assumes that the token is the key. That assumption is incorrect if it's a = or ) or whatever. Also, because the lexer already removes the " from quoted strings, a perfectly valid expression like "[" = "]" would be invalid. It then expects a = and tries to parse values in [ ] or () or assumes that the token must be a literal value without checking that. The code is okayish but could be improved.

Here is _parseValue:

  dynamic _parseValue(String token) {
if (token == '(') {
return _parseMap();
} else if (token == '[') {
return _parseList();
} else {
return token;
}
}

Depending on the token, it will parse a map or a list, which is correct. However, anything but a literal or quoted string (something the lexer doesn’t know) would be invalid at this point. There is absolutely no need to test for ( and [ twice. Either don't do it in parse or embed the code from _parseValue.

Parsing a list ist rather simple. It assumes tokens which are either new maps, lists or literals:

  List<dynamic> _parseList() {
List<dynamic> list = [];
String? token = _nextToken();

while (token != null && token != ']') {
if (token != '(' && token != '[') {
list.add(token);
} else {
list.add(_parseValue(token));
}
token = _nextToken();
}
return list;
}

Parsing a map would be more difficult, if the code would try to deal with types. But it completely ignores that and just assumes, that a map consists of key-value pairs separated by =.

  Map<String, dynamic> _parseMap() {
Map<String, dynamic> map = HashMap<String, dynamic>();
String key;
dynamic value;
String? token = _nextToken();

while (token != null && token != ')') {
key = token;
token = _nextToken();

if (token == '=') {
token = _nextToken();

if (token == '(' || token == '[') {
value = _parseValue(token);
} else {
value = token!;
}

map[key] = value;
token = _nextToken();
} else {
throw FormatException('Expected "=" after key');
}
}
return map;
}

So it might be able to parse the map example from the specification, but not the more complex typed examples:

person = (
first = John
last = Doe
age = 42
"favorite numbers" = [1 2 3]
)

I get a “type ‘List’ is not a subtype of type ‘String’” error because the value variable in parse is of type String and not dynamic. I agree with ChatGPT that it would be so much nicer if Dart had a real lightweight sum type instead of having to use dynamic. But we have to work with what's available.

After fixing that, it works.

Parsing the other examples with the exception of the typed expressions like the one below also seem to work, too.

/person = (first-name middle-initial last-name age)

So, are you impressed by ChatGPT’s features?

Reprinting a standard parser doesn’t impress me much. But this does:

Me: I think, _readQuotedString is wrong. Can you double check?

ChatGPT: You’re correct; I made a mistake in _readQuotedString. The method should handle escaped double quotes "" as a single double quote " in the resulting string. I've fixed the issue in the method below:

String _readQuotedString() {
StringBuffer buffer = StringBuffer();
String char = _consume();

while (char != null && char != '"') {
if (char == '"') {
if (_peek() == '"') {
buffer.write('"');
_consume(); // Consume the second quote
} else {
break;
}
} else {
buffer.write(char);
}
char = _consume();
}
return buffer.toString();
}

Me: You’re correct, but that code isn’t.

ChatGPT: Apologies for the confusion. I misunderstood your original concern. Here’s the corrected _readQuotedString method:

String _readQuotedString() {
StringBuffer buffer = StringBuffer();
String char = _consume();

while (char != null) {
if (char == '"') {
if (_peek() == '"') {
_consume(); // Consume the second quote
buffer.write('"');
} else {
break;
}
} else {
buffer.write(char);
}
char = _consume();
}
return buffer.toString();
}

Now the nested strings are correctly parsed. (Let’s ignore the fact that strings in general are not correctly parsed because the lexer doesn’t distinguish them from syntax.)

Me: I think, you forgot to implement typed lists

Just watching the code slowly appearing in the chat windows makes me think that this is too difficult. This is not correct:

String _parseTypeName(String token) {
if (token.startsWith('/')) {
return token.substring(1);
}
return null;
}

And this is completely wrong:

List<dynamic> _parseTypedList(String type) {
String token = _nextToken();
if (token == '[') {
List<dynamic> list = _parseList();
list.insert(0, type); // Insert type as the first element
return list;
} else {
throw FormatException('Expected "[" after type for list');
}
}

At least it adds: Note that this implementation doesn’t support type definitions, as the example provided in the specification does not clearly demonstrate how these should be handled.

I, on the other hand, think, the specification is clear. Let’s try again:

Me: I think, you forgot to implement typed lists. I’ll explain the syntax:

/name = (a b c) will declare a type "name" with three properties "a", "b", and "c". Such a type definition can only be used on the top-level implicit map.

a typed value like /(a b c)(1 2 3) is equivalent to (a=1 b=2 c=3). Instead of using /(a b c) to define a type "inline", you can also refer to a type name like /name(1 2 3) which, using the type definition from below, again create the map value (a=1 b=2 c=3).

a typed list like /(a b c)[(1 2 3) (3 4 5)] applies the type to each element - unless that element is explicitly typed. It is therefore equivalent to [/(a b c)(1 2 3) /(a b c)(3 4 5)].

It now (correctly) adds Map<String, List<String>> _types; to the parser to store type definitions and adds a method to parse them. However, it thinks (probably because I reached the token limit) that it lexes /(a b c) as a single token.

I tried to help with

Me: remember that your _TYONLexer class will correctly tokenize /(a b c) as six tokens "/", "(", "a", "b", "c", ")".

… but it gets only more wrong. I give up.

I need to do it myself or start again in a new session. But not today. When creating parsers, I like to work based on a EBNF grammar, so this is my last question:

Me: please write an EBNF grammar for TYON

TYON          ::= { TypeDecl | KeyValuePair }
TypeDecl ::= '/' Identifier '=' '(' { Identifier } ')'
KeyValuePair ::= Key '=' Value
Key ::= Literal | String
Value ::= Literal | String | List | Map | TypedList | TypedMap
Literal ::= ValidLiteralCharacters
String ::= '"' { AnyCharacterExceptDoubleQuote | '""' } '"'
List ::= '[' { Value } ']'
Map ::= '(' { KeyValuePair } ')'
TypedList ::= InlineType '[' { Value } ']'
TypedMap ::= TypeReferenceOrInlineType '(' { Value } ')'
InlineType ::= '/' '(' { Identifier } ')'
TypeReferenceOrInlineType ::= InlineType | TypeReference
TypeReference ::= '/' Identifier
Identifier ::= ValidIdentifierCharacters

And this is rather great.

PS, here’s the original code, in case somebody is interested.

PPS: I asked for a more compact lexer using regular expression to split the input into tokens and got this (I only added a final and made it null-safe). This is more to my liking, although the regexp is incorrect. I leave it as an exercise to the reader to fix it ;)

class _TYONLexer {
_TYONLexer(String input) : _iterator = _regex.allMatches(input).iterator;
final Iterator<RegExpMatch> _iterator;

static final RegExp _regex = RegExp(r'[/()\[\]=]|"(?:[^"]|"")*"|;[^\n]*|\S+');

String? nextToken() {
if (!_iterator.moveNext()) return null;

String token = _iterator.current[0]!;
if (token.startsWith(';')) {
return nextToken();
} else if (token.startsWith('"') && token.endsWith('"')) {
return token.substring(1, token.length - 1).replaceAll('""', '"');
} else {
return token;
}
}
}

Originally published at http://github.com.

--

--

Stefan Matthias Aust
ICNH

App Developer · Co-Founder of I.C.N.H GmbH · Pen & paper role player