Working with Unicode and Grapheme Clusters in Dart
As announced with Dart 2.7, we finally have support for properly handling grapheme clusters. That is a big win for those of us who do a lot of string manipulation.
What are Grapheme Clusters?
Grapheme is a wonder material made of carbon lattice… Oh, wait, no. Sorry, that’s graphene.
Graphemes are a subset of highly shared internet images… Oh, wait, no. Sorry, that’s graph meme.
Graphemes are written characters.
The word “character” can have a lot of meanings, though. What a computer thinks of as a character and what a human thinks of as a character can be two different things.
Dart uses Unicode for its strings and is encoded in UTF-16 format. That means that each character is a 16 bit value, aka code unit. That translates nicely most of the time:
a \u0061
b \u0062
θ \u03B8
家 \u5BB6