Regular Expressions: Repetition & Greedy / Non-Greedy Matching

A series of tutorials on Regular Expressions using Python

Zohaib Shahzad
4 min readJun 30, 2020

If you’ve stumbled across this article and are new to this series of tutorials on regular expressions, feel free to take a look at the rest of the series (in order):

  1. Regular Expressions: Basics
  2. Regular Expressions: Grouping & the Pipe Character
  3. Regular Expressions: Repetition & Greedy/Non-Greedy Matching
  4. Regular Expressions: Character Classes & findall() Method
  5. Regular Expressions: Dot-Star and the Caret/Dollar Characters
  6. Regular Expressions: sub() Method and Verbose Mode

Now that we have a decent understanding of what grouping is in regard to regex we can delve into more complicated concepts such as repetition operators.

Repetition operators are special meta characters: ? (question mark), * (asterisk), and + (plus sign). These repetition characters are used to specify how many times a sub-pattern can occur and they act differently in different situations.

The Question Mark (?)

In regex, the meta character “?” makes the preceding character optional. This symbol matches zero or one instance of the preceding character/group. For example, the regular expression [T]?he means: the uppercase letter ‘T’ is optional, which is then followed by the lowercase ‘h’ and ‘e’.

[T]he => The car is parked in the garage.

Test the regular expression

[T]?he => The car is parked in the garage.

Test the regular expression

Example 1: Question Mark Implementation

Let’s explore the regex: Bat(wo)?man. It looks like we grouped (wo) together and a question mark (?) follows it which means that the group (wo) is optional (only zero or one).

If we analyze the string above, “The Adventures of Batman”, we can see that the regex pattern will match with “Batman” since the (wo) is optional.

Example 2: Question Mark Implementation (con’t)

Analyzing the string from above, we’ll see that the regex pattern will still match with “Batwoman

Example 3: Question Mark Implementation (con’t)

For this string, “The Adventures of Batwowowowowoman”, the regex pattern will not match anything. The group (wo) repeats itself more than one time.

Going back to the telephone number example, if we want to make the area code optional this is what our regex pattern would look like:

phoneRegex = re.compile(r’(\d\d\d)-)?\d\d\d-\d\d\d\d’)phoneRegex.search(‘My phone number is 415–555–1234. Call me tomorrow.’) # WORKSphoneRegex.search(‘My phone number is 555–1234. Call me tomorrow.’) # WORKS

The Asterisk (*)

The asterisks (*) matches zero or more repetitions of the preceding group/character. The regex, a*, means: zero or more repetitions of the preceding lowercase character ‘a’.

Example 4: Implementing the asterisk

This example is one of the same from when we discussed the question mark repetition operator.

The regex: Bat(wo)*man. You can see that in front of the group (wo), there is an asterisk that follows it which means that there can be zero or more of the (wo) group in a string. The regex applied would be the correct fit for this given string.

The Addition Symbol (+)

The addition symbol (+) matches one or more repetitions of the preceding group/character. For example, the regular expression, c.+t, means: lowercase letter c, followed by at least one character, followed by the lowercase character t. It needs to be clarified that ‘t’ is the last ‘t’ in the sentence.

You’ll notice a period within the regex. The period is another special meta character which basically matches any single character except a line break (\n).

c.+t => The fat cat sat on the mat

Test the regular expression

Example 5: Implementing the Plus symbol

The regex: “Bat(wo)+man” would also match for “Batwowowowowoman” for the string above.

Escaping ?, *, +

regex = re.compile(r’\+\*\?’)regex.search(‘I learned about +*? regex syntax’)

If we want to escape any of the meta-characters such as ?, * or +, we can simply place a backslash in front of them search for those characters.

regex = re.compile(r’(\+\*\?)+’)regex.search(‘I learned about +*?+*?+*?+*?+*? regex syntax’)

In the regex above, we escaped the +, *, ? meta characters and we’re looking for its literals. We’re matching the group, (\+\*\?), one or more times.

Repetition in Regex - Quantifiers

In regular expressions, braces are typically used as quantifiers to specify the number of times that a character or a group of characters can be repeated.

Example 6: Quantifiers

Let’s take a look at the regex pattern above: (Ha){3}.

What we’re saying is, “match a string that repeats the group (Ha) 3 times”.

Example 7: Quantifier’s (con’t)

In the regex above, we made the area code optional and specified we want to match 3 phone numbers within the string.

Example 8: {x, y} - (at least x, at most y)

We can also match a range of possible repetitions {x, y} (at least x amount of times, and at most y amount of times)

In the regex above, we’re matching for a minimum of (Ha) three times and searching up to a maximum of 5 times.

Example 9: {, 5} - Leaving off first number will be assumed as 0.

haRegex = re.compile(r’(Ha){,5}’)

Here we’re simply leaving off the first number. If we don’t specify the first number within the range, the interpreter will assume the range begins at 0. Hence, the range being 0–5.

Example 10: {3, } - Leaving off last number will assume the ending is indefinite

haRegex = re.compile(r’(Ha){3,}’)

Here we would assume that the range begins at 3 and the end is indefinite meaning the regex pattern looks for 3 or more.

Example 11: Implementing a Range

digitRegex = re.compile(r’(\d){3,5}’)digitRegex.search(‘1234567890’)

In this regex, we’re basically looking for 3–5 digits within a string.

A little note about Greedy and Non-Greedy Matching

Regular expressions begin matching immediately. The earliest match they can find is what it will return. By default, regular expressions do greedy matches. Greedy matches are essentially the longest possible strings that can be matched and returned according to the regex pattern.

Example 12: Non-Greedy Match

digitRegex = re.compile(r’(\d){3,5}?’)digitRegex.search(‘1234567890’)

Let’s explore the regex above in more detail. If we place a question mark after the preceding range, the regex pattern will only traverse and match the minimum amount of repetitions and will not the maximum.

--

--