Safe, concise text parsing with regex destructuring in Kotlin

I recently had the unfortunate task of parsing large amounts of text from a large set of badly formatted .csv files. Java’s built-in support for regular expressions (found in thejava.util.regex package) is quite tricky. In my experience, protecting against null values and bad input is nearly impossible to do concisely. Luckily, I was using Kotlin and found a neat trick you can use to accomplish this: destructured regular expressions.

If you are unfamiliar with the concept of destructuring, here is a pretty concise definition:

Destructuring is a convenient way of extracting multiple values from data stored in (possibly nested) objects and Arrays. It can be used in locations that receive data (such as the left-hand side of an assignment).

It is a supported in both Kotlin and JavaScript. The most common way I’ve seen it used is for using Pair and operating on Map entries easily. For example, here is how you might use destructuring to easily find all entries in a map where textual representations of a key and value are equal:

hashMap.filter { (key, value) ->
"$key" == "$value"
}

Without destructuring, it requires manually accessing each property of the entry for a key and value:

hashMap.filter { entry ->
"${entry.key}" == "${entry.value}"
}

As you can see, the first example is more readable. The reasons to use destructuring become much more compelling when multiple expressions or statements are written inside of a closure. Enter Kotlin’s MatchResult.Destructured class, which according to the docs:

Provides components for destructuring assignment of group values.
component1 corresponds to the value of the first group, component2 — of the second, and so on.
If the group in the regular expression is optional and there were no match captured by that group, corresponding component value is an empty string.

I discovered destructured regular expressions when parsing a column of data which contained phone numbers. Here is a run-down:

For the sake of simplicity, assume all inputs represent a phone number which has the format “888–888–8888”. Here is the expression:

([0-9]{3})-([0-9]{3})-([0-9]{4})

The expression uses groups to divide a phone number into the area code, number prefix, and the phone line number, each separated by a hyphen.

The goal is to parse a String and return an instance of a PhoneNumber class:

data class PhoneNumber(
val areaCode: Int,
val prefix: Int,
val lineNumber: Int
)

And here is the code:

I started using Kotlin almost a year ago, and I am still surprised every now and then by neat features like this in the standard library.