
Text manipulation is a common programming problem. However, I generally try to avoid regular expressions (regex) because they are completely unreadable:
RegExp _email = RegExp(r"^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))$");
Every now and then, though, it’s easier to use them than to come up with a custom parser. One such case is extracting groups of text from a longer string. The rules for regular expressions are generally the same in any language, but I’ll show the specifics of how to do this in Dart.
Defining the problem
Say you are trying to extract the time and lyrics from an LRC text file:
[00:12.30]Twinkle, twinkle, little star
[00:17.60]How I wonder what you are
[00:21.50]Up above the world so high
[00:25.30]Like a diamond in the sky
You can get each line pretty easily like this:
List<String> rawLines = text.split('\n');print(rawLines[0]); // [00:12.30]Twinkle, twinkle, little star
But what you’d really like to do is get the time and words for each line, something like this:
final time = Duration(minutes: 0, seconds: 12, milliseconds: 300);
final words = 'Twinkle, twinkle little star';
That means there are four separate groups to extract:

There is a regex to do that, but just giving it to you directly would be another one of those unreadable regexes, so let’s solve the problem one step at a time.
The basic regex
To create a regex matcher, you use the RegExp
class in Dart.
final regex = RegExp(r'');
You put the regular expression inside the quote marks. It’s useful to use a raw string (that is, one starting with r
) so that you don't have to escape so many things later.
Match the start and end of the line
This isn’t strictly necessary, but if you’re working with a whole line anyway, matching the start and end of the line might prevent some surprises that you could get by matching something else.
^
matches the start of a line$
matches the end of a line
That means so far your regex should look like this:
final regex = RegExp(r'^$');

Match the constant parts
The characters that won’t vary in the string are [
, :
, .
, and ]
:

But [
, .
, and ]
all have special meanings in regex so you have to escape them by prefixing them with \
:
\[
\.
\]
That makes the regex look like this:
final regex = RegExp(r'^\[:\.\]$');

That doesn’t actually match our line at all right now because we still need to add the variable text.
Match the variable parts
Again, the four groups that you want to capture are minutes, seconds, fractional seconds, and words:

You can use the following patterns to match:
[0-9]+
— Match one or more digits. The brackets matches one of whatever items are in the range and+
is a wildcard meaning one or more matches. (Alternatively you could use\d
instead of[0–9]
, but I find[0-9]
easier to remember.).*
— Match zero or more characters. The.
matches any single character and*
is a wildcard to match zero or more occurrences of whatever character precedes it. We’ll use this for the song lyrics in order to allow some songs to have blank lines while still containing a time stamp.
That makes the regex look like this:
final regex = RegExp(r'^\[[0-9]+:[0-9]+\.[0-9]+\].*$');

The match is actually complete, but you don’t have a way to extract the variable parts. You use groups for that.
Capture the groups
You can capture groups by surrounding them with parentheses:
final regex = RegExp(r'^\[([0-9]+):([0-9]+)\.([0-9]+)\](.*)$');

Now you’re ready to extract the parts inside the parentheses.
Pulling it all together
Here is how you extract the text you want:
final line = '[00:12.30]Twinkle, twinkle little star';
final regex = RegExp(r'^\[([0-9]+):([0-9]+)\.([0-9]+)\](.*)$');final match = regex.firstMatch(line);
final everything = match.group(0); // [00:12.30]Twinkle, twinkle little star
final minutes = match.group(1); // 00
final seconds = match.group(2); // 12
final fraction = match.group(3); // 30
final words = match.group(4); // Twinkle, twinkle little star
Notes:
- The way you actually perform the matching is to call
firstMatch
on the regex. You can usefirstMatch
because you’re already matching the full line. If you hadn’t split the entire text into lines first then you could callregex.allMatches
, which would give you an interable collection of matches that you could then loop over. - As you can see,
group(0)
matches everything, whilegroup(1)
togroup(4)
matches the parts you surrounded with parentheses.
The extracted text time groups are still strings so if you want to convert them to a Duration, then you’ll need to do the conversion:
final time = Duration(
minutes: int.parse(minutes),
seconds: int.parse(seconds),
milliseconds: int.parse(fraction.padLeft(3, '0')),
);
That’s it for regex groups in Dart. If you can get past the poor readability of the regex matching patterns, they can be a convenient way to extract what you need from text strings.
I posted the original version of this article on Stack Overflow, but they aren’t exactly happy to see more regex questions over there. I’ve expanded my answer here for Medium.