Regular Expressions: Grouping and the Pipe Character

A series of tutorials on Regular Expressions using Python

Zohaib Shahzad
The Startup
3 min readMay 5, 2020

--

If you’ve stumbled across this article and are new to this series of tutorials on regular expressions, feel free to take a look at the rest of the series (in order):

  1. Regular Expressions: Basics
  2. Regular Expressions: Grouping & the Pipe Character
  3. Regular Expressions: Repetition & Greedy/Non-Greedy Matching
  4. Regular Expressions: Character Classes & findall() Method
  5. Regular Expressions: Dot-Star and the Caret/Dollar Characters
  6. Regular Expressions: sub() Method and Verbose Mode

In this tutorial, we’ll be delving into grouping and the pipe character specifically for regex.

In the previous article, we worked on a scenario where we would create a function and then soon after a regular expression to search for patterns matching telephone numbers within any string.

The snippet of code above is what we concluded the previous article with. However, what if we just wanted the area code or just some part of the phone number? That’s where grouping comes into play; we can use parentheses to mark out the groups.

Groups in Regex

Take a look at line 1. You’ll notice we wrapped parentheses around the area code and another set of parentheses around the rest of the digits.

As you can see, we have two groups within the regex pattern. The first group is for the area code while the second group is for the rest of the phone number. Now if we want to search for just the area code, we can call that group out specifically. We’re simply matching a portion of the phone number using what is known as a capturing group.

What if we want to literally find brackets within the pattern?

Directing your attention to line 1 in the snippet of code above. Notice how we placed a backslash in front of the opening bracket and another backslash in front of the closing bracket. That’s how we would escape parentheses in regex and that’s how we can search for literal parentheses in our regex pattern.

NOTE:

\(and\) matches literal parentheses in the regex string

Backslashes in Regex

The backslash is an escape character in strings for any programming language. That means the backslash has a predefined meaning in languages like Python or Java. You have to use a double backslash to define a single backslash. If you want to define \w, then you must be using \w in your regex. If you want to use a backslash as a literal, you must type \\ as \ is also an escape character in regular expressions.

The Pipe Character

The pipe character also referred to as the vertical bar indicates alternation, that is, a given choice of alternatives. Most programmers know what a pipe character is and how it works from working with simple if statements. Pipe characters work the same in regular expressions.

Let’s take a look at the following example:

Looking at line 1 of the code. You’ll see we begin the regex pattern by searching for a string that begins with Bat and then following that are given alternatives: man, mobile, copter, and bat.

That regex pattern is simply saying: “find me a string that starts with Bat and any of the texts attached to the Bat prefix are acceptable”.

The following would be match-able strings: Batman, Batmobile, Batcopter, Batbat.

mo = batRegex.search(‘Batmotorcycle lost a wheel’)mo == None # returns value “none”

Now if we tried searching for a match in the string above, we wouldn’t get one.

If the search() method cannot find the pattern specified, it will return a value called “None”. If we blindly assign whatever the search() method returns to a match object (mo) and call the group() method on it, you’ll get an error message because the “None” value does not have a method called group().

--

--