Java RegEx: Part 11 — Reluctant (Non-greedy) Quantifiers

Sera Ng.
Tech Training Space
5 min readOct 19, 2020

Hello, and welcome back. In this part, I’m going to discuss greedy and reluctant quantifiers.

First of all, what are greedy quantifiers? Greedy quantifiers allow the regular expression engine to match as much as it can. We actually have used greedy quantifiers quite a lot in previous examples.

Here are greedy quantifiers we have come across:

So, we have used these patterns before, why are we talking about them again?

Actually, we need to delve into details to see more clearly how they work and how they affect matching results.

As always, let’s see an example:

I have a simple string containing some letters and digits:

String text = “The order number is 8983”;

What I want to achieve is to break the string and extract the number 8983. That sounds like a simple task.

I have defined the following pattern:

String regex = “(.*)(\\d+)”;

In the pattern, I have two groups as you might have discovered:

  • The first group includes the dot (.) and a greedy quantifier: the star character (*), meaning that they can match zero or many any characters.
  • The second group includes a digit character with the plus sign (+), meaning that they can match more than one digits.

And in the while loop, I call the group() method and pass 1 and 2 as indexed parameters to get the 2 matched groups.

while (matcher.find()){   System.out.println(matcher.group(1));   System.out.println(matcher.group(2));}

If we run the program, we have the following outputs:

The order number is 898

3

  • The first group returned by the first group pattern.
  • And the second result returned by the second group pattern: “3”.

Why is that?

We got those results because we have used a greedy quantifier in the first group, which is the star (*) character.

Here what had happened in the background:

Since the greedy quantifiers try to match as much as it can, the first group consumed the whole string because in the first group we have used the dot (.) character, which means it matches any kind of character.

When the first group pattern had done the match, the second group pattern came into play and tried to match as much as possible.

But at this point, there were no characters left in the input string because the whole input string had been consumed by the first group.

So, the first group then started to backtrack and slowly released each character that it had collected.

Those released characters were provided to the second group to match.

At this point, there were 2 cases that might have happened:

  • The first case was that the second group would achieve a match. Then the first group stopped releasing more characters.
  • The second case was that the second group would fail to match all the characters that had been released by the first group. Then, the first group re-collected those characters again.

Our example was in the first case. Since the first group released the first character, which was the digit 3, it perfectly matched with the second group. So, the first group stopped releasing more characters and the whole matching process stopped as well.

That’s why we received the results as we have seen here.

Let’s perform a test on those operations one more time:

I have changed the input string by moving the number 8983 before the word “number”:

String text = “The order 8983 number is”;

Now run the program and we have:

  • The first group captured the string “The order 898”.
  • And the second group captured the digit “3”.

Although the first group pattern matched the whole input string, it only consumed the text from the beginning to the digit 8.

That’s because as we have just explained, at first, the first group collected the string as a whole, then it had to release characters to give the second group more opportunities to match. And when it released the rightmost digit, which was the digit 3, the digit perfectly matched the second group pattern. so, it stopped releasing more characters. But those characters that had been released were not re-collected because the second group has found a match.

So, what if we want to get the whole number “8983” in the string? Does that mean we want the second group to capture that whole number?

Actually, we can solve this problem by telling the group 1 pattern to match as little as possible, as opposed to as much as possible as its nature.

In that case, we can use the reluctant quantifier. Reluctant quantifiers tell a group pattern to match as little as possible.

Before solving our problem, let’s take the following little example:

In the above example, I have defined a group that can match more than one digits. The sample input string contained the number “8983” only.

And if we run the program and we see the whole input string is printed out because it matches the group pattern.

However, if I apply the reluctant quantifiers in the pattern, the result will be different.

We form a reluctant quantifier by appending a question mark to an existing greedy quantifier. Like this.

String regex = “(\\d+?)”;

Now, The group pattern becomes a reluctant quantifier which means it will match as little characters as possible.

Since the pattern can match at least one digit, the reluctant quantifier will match one character at a time. Because this is the least it can match.

If we run the program, we’ll get every single digit printed out:

8

9

8

3

Now let’s get back to our problem that we want the second group to capture the whole number “8983”.

I have reversed the input string as the original:

String text = “The order number is 8983”;

And I have added a question mark to the first group to form a reluctant quantifier.

String regex = “(.*?)(\\d+)”;

As a nature of reluctant quantifiers, the first group will try to match as little as possible.

So how does the new pattern work this time?

First, the first group pattern will match and consume the whole string.

Then the second group will come into play. Since there are no characters left, the first group will start to release each character to give the second group more chances to achieve a match.

It will first release the first character, which is the digit “3”. This character, of course, matches the second group.

But the first group will not stop here as if it were greedy. Our first group now is reluctant, so it will continue to release more characters so that the second group will match as much as possible.

As a result, the first group will release all the digits so that the second group can match all of them.

Now let’s run the program and we can see the result:

8983

I hope that’s pretty much clear to see how greedy and reluctant quantifiers work.

--

--