A Case for Pattern Matching

Benedikt Jerat
Digital Frontiers — Das Blog
9 min readMay 13, 2019

Every programmer knows this situation. At some point, you come across a piece of code that is so full of nested if, else and case statements that you sit in front of the screen shaking your head for a while. The reasons for code like this are numerous. The application may have grown historically over the years. The knowledge gap between the individual developers may be relatively large. Maybe, the project plan did not allow time to remediate technical debts.

Reducing branch complexity is quite important for maintainability reasons. It simplifies reasoning about the code. Let me show you an example that demonstrates why if/else makes your code less maintainable.

Branching Madness

In our first example, a string should be parsed according to the ISBN-13 standard. An ISBN-13 number contains thirteen digits and is divided into five segments: prefix, group code (also called region or language code), publication code, title code, and checksum.

For example, the ISBN-13 number 978-3-89864-445-7 separates into:

  • Prefix: 978
  • Group Code: 3 ( → Language Code for Germany)
  • Publication Code: 89864 ( → dpunkt publication)
  • Title Code: 445 (Groovy: Grundlagen und fortgeschrittene Techniken: Grundlagen und Anwendungen)
  • Checksum: 7

Now there are two prefixes, dozens of group codes, one or more publication codes per group and one or more title codes per publication. The checksum is derived from all digits of the ISBN (including the checksum itself).

Let’s assume someone would like to differentiate between languages first and publications second, which could lead to an implementation like the following:

Not much to say, except that this is verbose and awful and it’s getting worse with each new branch. These kinds of if/else ladders make it hard to keep track of the actual outcome. You’ll always have to remember to not forget any break statement or default branch.

If you would like to assign the result of the branch evaluation to a variable, the assignment would need to be done in each branch. The Java if is no expression and cannot return anything.

Pattern Matching To The Rescue

For that reason, more and more programming languages integrate a concept called pattern matching. It can drastically reduce the code complexity and change the way of reasoning about parts of the code base. As the language Scala made pattern matching one of its core feature, I’ll use it for my remaining code samples.

The same example can be implemented less verbose by using pattern matching:

Pattern matching allows us to write these branches very concise and flat. Our attention is not distracted by nested branch handling. Everything is on the top-level. Additionally, the result of the pattern matching expression can be directly bound to a variable.

The ISBN object parses the string for us and returns the arguments to be pattern matched in its unapply method:

An unapply method takes an object and tries to return the arguments. In our example, it takes the ISBN string and splits it into the respective components. When we call case ISBN(_, Group.FRENCH, _, title, checksum), we’re actually calling the unapply method. As we want to match on several sub-values, we need to group them in an optional tuple.

Some Background

Actually, pattern matching is quite an old concept. One of the first programming languages that introduced patterns as a first-class data type was SNOBOL4 in the year 1967. The general-purpose language ML lifted pattern matching to a new level with static typing and type inference and greatly influenced the design of Erlang, OCaml, and Haskell.

Outside of the area of functional programming the term ‘pattern matching’ is mostly associated with matching strings by using regular expressions. This comes from the fact that advanced pattern matching is based on the concept of algebraic data types, which are somewhat unpopular in the object-oriented world. Let me give you an example:

sealed abstract class Option[+A] extends Product with Serializablefinal case class Some[+A](value: A) extends Option[A]case object None extends Option[Nothing]

In Scala, the Option class is defined as an algebraic data type with two distinctive representations. There is either some value or no value at all.

The official Scala documentation states the following on the topic:

“Pattern matching is a mechanism for checking a value against a pattern. A successful match can also deconstruct a value into its constituent parts. It is a more powerful version of the switch statement in Java and it can likewise be used in place of a series of if/else statements.

Pattern matching is a powerful technique for decomposing data structures and objects and is not limited to constants. These patterns can be complex and deeply nested. Individual patterns may be guarded by additional matching rules that further analyze the pattern structure. Additionally, exhaustiveness checks can be applied by the compiler automatically, assuring that no case is missing.

Pattern matching expressions are structured as follows:

The selector is the expression to be matched on by 1 to n cases. Each case consists of a pattern, an optional pattern guard and a block expression.

The current Scala version at the time of writing knows 15 different types of pattern (for the curious, see Pattern Types).

Match All The Things

In this section, we will examine a few common pattern types.

Literal and Variable Pattern

def lastToken(separator: String, value: String): String = {
value.lastIndexOf(separator) match {
case -1 => value
case n => value.substring(n + 1)
}
}
lastToken(";", "twitter;facebook;google")
res0: String = google

In most cases, more than one type of pattern is used in the same pattern matching expression. For example, here a literal pattern and a variable pattern is used. If the result of the indexOf operation is the literal -1, this will be matched, otherwise, the value will be bound to the variable n. That variable can then be used in the expression block.

Typed Pattern

def compress(nodeSeq: NodeSeq): NodeSeq = {
def compressNode(node: Node): Node = node match {
case t: Text => Text(t.text.trim)
case e: Elem => e.copy(child = e.child.map(compressNode))
case other => other
}
nodeSeq.map(compressNode)
}
val xml =
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.8</version>
</dependency>
compress(xml)
res0: scala.xml.NodeSeq = NodeSeq(<dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>2.12.8</version></dependency>)

The method compresses XML node sequences to simplify equality comparisons. It is recursively iterating the XML tree and trims every text node.

There are three relevant cases to match:

  • Text node → Trim the whitespaces
  • Elem node → Iterate over all child nodes and call the recursive method
  • Any other node (e.g. attribute node) → Just return the value

Type patterns like case t: T match, when the given value is of type T. Actually, this is just a sophisticated way of performing an instanceOf type check.

The compress method uses two typed patterns (for handling Text and Elem differently) and a variable pattern for every other type of node.

Constructor and Infix Operator Pattern, Pattern Guards

def versionToLong(components: List[Int]): Long = {
components match {
case major :: minor :: Nil if major >= 9222 =>
encodeVersionToMajorWithoutYear(major, minor, 0)
case year :: major :: Nil =>
encodeVersionToLong(year, major)
case major :: minor :: hotfix :: Nil if major >= 9222 =>
encodeVersionToMajorWithoutYear(major, minor, hotfix)
case year :: major :: minor :: Nil =>
encodeVersionToLong(year, major, minor)
case year :: major :: minor :: hotfix :: Nil =>
encodeVersionToLong(year, major, minor, hotfix)
case year :: major :: minor :: hotfix :: rest =>
encodeVersionToLong(year, major, minor, hotfix)
case other =>
throw new IllegalArgumentException(s"Could not encode ${version}")
}
}

Here, the actual pattern types are harder to differentiate. Actually, a combination of constructor patterns, infix operation patterns, and pattern guards is used to encode a version string to a long number.

The components list is destructured by invoking the unapply method of the case class ::. This class is actually one of the two distinctive representations of the immutable list in Scala:

  • Cons (or ::) → non-empty list with successor element
  • Nil → empty list

An unapply method is a necessary requirement for a constructor pattern. This pattern was already demonstrated in the ISBN example. For case classes, the method is automatically generated by the compiler.

The infix operator pattern is an alternative representation for a constructor pattern. That means, these two patterns are identical:

case year :: major :: Nil =>
case ::(year, ::(major, Nil)) =>

The first one is just easier to read, as it illustrates the recursive structure of the linked list, consisting of consecutive head and tail.

Pattern guards are predicate functions that further restrict the case match. A case with a pattern match only matches, when the pattern itself matches and the pattern guards yields true.

Comprehensive Example

Pattern matching is omnipresent in Scala, because it is such a powerful feature. In one of my projects, there was a task to transmit a bulk of events. The transmission should be aborted as soon as a maximum amount of transmission errors was reached for the operation — it simply doesn’t make any sense to load a server with messages when the server doesn’t seem to respond in time. In addition, there were event types that could not be resent.

The idea is simple: Process each pending event until either no events are pending or the maximum number of send attempts has been reached. Resendable, failed events need to be collected along the way. They will be treated as pending events for the next iteration.

The entry point is the sendEvents method which receives a list of events to process and recursively calls the sendEventsUpToErrorCount method. While the sendEventsUpToErrorCount method handles the current iteration of pending events (the horizontal line in the figure), the sendEvents triggers the next iteration (the vertical line).

sendEventsUpToErrorCount receives a list of events to process and recursively calls itself, handling the three possible cases:

  • No pending event is left → return the pending trials and failed events
  • Pending events are left, maximum error count is reached → return the failed events and pending events that could not be processed
  • Pending events are left, maximum error count is not reached → try to send the current pending event, proceed with the next pending event by recursively calling the process function

Pattern matching allows you to easily distinguish these three cases from each other.

Finally, send receives the event to be sent, transmits it to the server and evaluates the response:

  • A http status response in the 2xx range is expected to be a successful transmission.
  • Http status 500 (Internal Server Error) leads to resends, if the event is resendable.
  • Responses in the 4xx range are client errors in the actual message and should never be resent.
  • Everything else we regard as not resendable.

A sendResult is returned depending on whether the transmission was successful or not.

Of course you could have used if/else branches here as well. Instead of recursion, the processing of the pending events can also be implemented by using a while loop. The list of pending and failed events could be aggregated in a mutable list. The pending trials could be implemented as counter variable, which would be decremented for each failed event. There are always several ways to implement the same thing.

The advantage of pattern matching is that it is very flexible and powerful at the same time. Within the pattern, the data structure can be dynamically disassembled. These parts can then be assigned directly to variables that only apply within this expression. In addition, the compiler informs you if a branch is missing.

In 2019, there are still not that many programming languages with a comprehensive integration of pattern matching. The most prominent ones are Haskell, Scala and Erlang. Kotlin’s when expression supports pattern matching at least to some extent. I would really like it to be natively supported in more programming languages. For C++ and Java there are already proposals to support pattern matching in the language:

If these languages support pattern matching, this would definitely be a step in the right direction. If you are already in the blessed position of being able to work with a programming language that has concepts like pattern matching and algebraic data types, you should use them. You gain maintainability and type safety.

Thanks for reading! Feel free to comment or message me, when you have questions or suggestions.

--

--