Regular Expressions in Swift

Dennis Walsh
8 min readFeb 2, 2019

Regular Expressions are something we learned about in school, in the more general sense, but they come up fairly often in programming tasks particularly when dealing with text. There was a question on Stackoverflow asking about identifying a subset of text in a paragraph. The premise of the question was around having a preview of a news article, the end of which contained the remaining count of characters on the full article e.g. [+6025 characters]. The objective was to replace the character count text with a hyperlink displaying “Read more?”.

My first thought was that we needed to use a regular expression to find the text between the brackets. Then we would replace the text with the desired text, and finally we need to add text attributes, in particular a hyperlink. Whew, that seems like a lot to do, so let’s go through it one step at a time.

Swift does not have a regular expression class, so you might think you need to use NSRegularExpressions from Objective-c world, however, you would be wrong. Swift Strings are automatically bridged to NSStringwhich means String has some nice features that allow us to easily perform pattern matching (via a regular expression) on a String. The best example is the range(of:options:) method on the NSString class. This method searches for a string within a string, but the options are what make it really powerful. There are a handful of available options such as case insensitive, backward, and of interest to us regular expression. These options are self-explanatory, maybe with the exception of regular expression, which treats the search string as an “ICU-compatible regular expression”. This is great, we just need to define a regular expression and pass that to the method range(of:options:) and it will tell us if there is a match and the Range where the text is found. We now just need to create a regular expression for input.

Defining a regular expression is another matter entirely. If you have ever seen any on StackOverflow or in the real world, they can become very complicated very quickly. One great site for reference is regexr.com where you can test out expressions and get detailed info about constructing expressions. Let’s take a minute and describe how we construct a regular expression pattern in the first place. We will take an introductory approach here and just cover some of the basics.

If we wanted to search for the letter a in a string the regular expression would simply be a pattern containing only “a”. Similarly, if we wanted to search for specific characters, in a specific order, like say cat, then the pattern would be “cat”. This is all well and good, but for our purposes, we need to find brackets [ ] and really any text in between those brackets. We need a pattern that is more powerful.

Enter metacharacters, which are characters that have special meaning and powers in regular expression patterns. Some of the most important metacharacters are the dot ., the question mark ?, the asterisk *, caret ^, parentheses, () and brackets []. Let’s quickly review what each of these characters means in a regular expression.

The . means match essentially any character, except line break characters. This is going to be useful for our purposes since that is what we want between our brackets, is any character. The brackets [] define matching a character class, which means match one out of several characters between the brackets. For example, if you defined your pattern as [if] it would find a match on the input “Swift” matching i (either i or f would be a match), as well as on “wit” or “half”, where it would match i and f respectively. You see this used most commonly to match letters [a-z], or uppercase [A-Z] and for numeric characters like [0–9]. We can further combine these to find more complex patterns in our input, e.g.[a-zA-Z] . We can think of this as any single character that is in the Latin alphabet, but not anything else.

let pattern = “[a-zA-Z]” let text = “It was the best if times”
let result = text.range(pattern, options:.regularExpression)
result = match at index zero, with length 1
let text = “555–123–4567”
let result = text.range(pattern, options:.regularExpression)
result = nil //no match
let pattern = "[a-z]at"
let text = "cat"
let result = text.range(pattern, options:.regularExpression)
result = match at index zero, with length 3

Reviewing the above examples, we see in the first instance a match will occur if the first character is any letter. In the second we’ll get no match (result will be nil). In the last case, a match would occur on any string starting with any letter followed by exactly at such as cat, bat, hat, mat, you get the idea. One important thing to note that if you want to use any of these metacharacters as a literal in a regex (brackets in our case), you need to escape them with a backslash. So how would we match our pattern of [+6025 characters]? We would simply need to define a pattern that says include an open bracket, then any character, then a closing bracket. That would look something like this let pattern = “\\[\\]” . Notice we escaped each bracket with two slashes \\ so they will be matched exactly. In Swift we need to use two slashes to escape a literal. We need to escape the escape character, this is because of string interpolation in Swift. However, we need a bit more to get our regular expression to match [+6025 characters] .

Before we move on to our example, let’s look at some other useful metacharacters. The question mark ? means the character is optional. So for optional characters we would use the question mark. If we wanted to check if the word color was present in a string (including the European variant) we would write an expression like

let pattern = “colou?r” 
let text = "color"
let result = text.range(pattern, options:.regularExpression)
result = match at index zero, with length 5
let text = "colour"
let result = text.range(pattern, options:.regularExpression)
result = match at index zero, with length 5

The asterisk * means match zero or more of the preceding character (+ means one or more). The asterisk is tricky and should be used with caution as it can result in empty matches if it is used improperly.

let pattern = “[0-9]*” 
let text = "123 Main Street"
let result = text.range(pattern, options:.regularExpression)
result = match at index zero, with length 3
let text = "Main Street"
let result = text.range(pattern, options:.regularExpression)
result = match at index zero, with length 0 //UNEXPECTED!!
let pattern = “[0-9]+”
let text = "Main Street"
let result = text.range(pattern, options:.regularExpression)
result = nil

We can see in the above example, that the use of asterisk in the first case works as we might expect, finding 123. We might not expect that it would find a match in the second case, albeit, a zero length match. It does because it is zero or more numeric characters. In the third case the result produces nil, since there is no such range in the input text. The pattern requires one or more numeric characters. Lesson learned, be careful using *, what you probably want is +.

Along this line, the curly braces {} are quantifiers, they match the specified quantity of the previous character (e.g. {3} means exactly 3, and {1,3} means 1 to 3). to indicate counts. If we were trying to match a phone number we could break this down as any set of three digits [0–9], followed by a dash -, then again any set of three digits followed by a dash, and finally any set of four digits [0–9]. So to match a U.S. phone number we might write something like:

let pattern = “[0–9]{3}-[0–9]{3}-[0–9]{4}”

We would need a more complex pattern to consider things like country code, parentheses around the first three digits, whitespace, etc., but this should give you an idea of how you can use quantifiers and classes together. The expression to match a phone number is complex, one possible version is ^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$ . There are some extra characters in here we are not going to cover (e.g. \d means digit, \s whitespace) as it is beyond this introduction.

Finally let’s look at the parentheses () and the caret ^. Parentheses are a grouping operator. It groups multiple characters together that can then be used for extracting substrings. Any characters within the parentheses will be matched together. The caret ^can mean several things, such as match the beginning of a string, the beginning of a line, or it can also be a negation character.

let pattern = "^abc"  
let text = "abc street"
let range = text.range(of: pattern, options: .regularExpression)
result = match at index 0, with length 3
let text = "123 abc street"
let range = text.range(of: pattern, options: .regularExpression)
result = nil
let pattern = "[^abc]"
let text = "abc street"
let range = text.range(of: pattern, options: .regularExpression)
result = match at index 3, with length 1 //first whitespace
let pattern = "(abc)"
let text = "main abc street"
let range = text.range(of: pattern, options: .regularExpression)
result = match at index 5, with length 3

Let’s quickly review the above example, the caret is used to check if a string starts with abc. In the first case it succeeds, while in the second it fails. We then used it to exclude the class abc from consideration. The parentheses are used to group the letters abc and find if that grouping occurs in the string.

So now that we understand a little more how to construct a regular expression, let’s get back to the problem at hand. How do we make a regular expression to find something like [+6025 characters]? The call to range(of:options:)will look like this:

let text = "this is some text where we can replace the last part... [+6025 characters]"
let pattern = “\\[.*\\]”
let range = text.range(of: pattern, options: .regularExpression)
result = match at index 56, with length 18

Our pattern is simple, match open bracket, any number of characters within the brackets, then a closing bracket. Great, now we have the expression we need to find out if the text contains our replacement criteria. So now how do we go about replacing the text? I’m glad you asked, I have created two examples of how we could replace the text and add the hyperlink, one uses Swift’s Strings and the other uses NSRegularExpression. Let’s look at the Swift String first.

We define our pattern and use the range(of:options:) function to get the location of the text for replacement. We then call the text replacement method replacingCharacters(in:with:) to replace what we found with our “Read more” text. Both of those methods are actually NSString methods we get via toll-free bridging. Then we get the range of the modified text so that we can add a link attribute. Swift does not have attributed strings, so we need to use NSMutableAttributedString to add the link. That’s not too bad, it is around 8 lines of code. Can we do better? What if we used NSRegularExpression instead?

Here we create the NSRegularExpression, using the method stringByReplacingMatches(in:options:range:withTemplate:) to replace the target text with “Read more”, then we again find the range (which is anNSRange so no conversion), and add the link attribute. It’s a little bit shorter and I think it reads a little easier. If we were not adding hyperlinks then maybe using range(of:options:) might be a better solution depending on what you need to do with the range.

Hopefully you enjoyed learning a little more about regular expression and how we can use them easily in swift.

Sources

Advance Regular Expressions
Regexr.com
Regular Expression.info
Swift and Regular Expressions: Swift
ICU User Guide

--

--

Dennis Walsh

iOS Developer working with Swift and Objective-C Worked @X-Team, @Verys, LLC, @Netbrains, Inc., @Mirth Inc.