Java RegEx: Part 12 — Possessive Quantifiers

Sera Ng.
Tech Training Space
4 min readOct 19, 2020

In this part, we are going to explore another advanced technique supported in a regular expression engine called possessive quantifiers.

To understand what possessive quantifiers are, let’s see an example,

I re-use the example in the last part.

In this example, as you might have worked with, we have used the reluctant quantifier technique to get the number part in the input string.

Let’s recollect how it works.

  • First, the first group pattern consumed the whole text since it matched all the characters. It then backtracked and slowly released each character that it had collected to give the second group more opportunities to match.
  • The last digit in the input string was released, that digit was matched and consumed by the second group.
  • Since the first group was reluctant which tried to match as little as possible; and the second group was greedy which tried to match as much as it could, the first group continued to release more characters so that those characters could match with the second group. As a result, all digits in the input string were consumed by the second group.

That’s how the greedy and reluctant quantifiers work.

The possessive quantifiers work similarly to the greedy technique in the first step, but differently in the second step in which it does not backtrack and release characters, it had collected for the other groups to find a match. And this may significantly change the matching results.

So, let’s try to apply possessive quantifiers in the first group.

We form a possessive quantifier by appending a plus sign to the existing greedy quantifier. Thus the star becomes start plus.

String regex = “(.*+)(\\d+)”;

After applying the possessive quantifier in the pattern, let’s run the program again.

And we see there was no match.

That’s because: at first, the first group consumed the whole input string since it matched all the characters. But since it was a possessive quantifier, it did not backtrack and release characters to give the second group a chance to match.

As a result, the second group couldn’t achieve a match and that led to the whole expression pattern did not find a match.

But what does that actually mean? Well, that means using appropriate possessive quantifiers can significantly improve the performance of the regular expression engine because it allows the engine to fail faster.

I have another example below:

In the example, I have defined a pattern to match any input strings that can start with any characters and must end with 4 digits.

In the first group of the pattern, I have used greedy quantifiers.

And I have created a very long string for demonstration purposes. And of course, this string does not match with the pattern because it does not end with 4 digits.

Before performing the matching process, I get the time in milliseconds by calling the currentTimeMillis() method in the System class.

Then processing the matching.

And after finishing the matching, I get the current time again, then minus the starting time, to get the time the matching process has taken.

Now run the program and notice the time it has taken.

In my system, it’s about 51 milliseconds.

Let’s change the pattern to use possessive quantifiers. I put another plus sign to form the possessive quantifiers.

String regex = “(.++)\\d{4}”;

Now, run the program again, and notice the time it has taken:

In my system, it’s around 11 milliseconds, which was much less than the last one.

The elapsed time in both cases may slightly change due to some other factors, but the possessive version should always produce less time than the greedy version, although they both give the same matching results.

First, let’s discuss how the greedy version works in this case:

As I have mentioned previously, the greedy quantifiers try to match as many characters as it can. So, at first, the first group matched and collected all the characters in the input string.

Next, the second pattern would try to match any digits in the input string. Since there was no character left in the input string, the first group backtracked and released each character so that it would match with the second pattern.

The first character released by the first group was the last character in the input string which was the “s” character. This character of course did not match with the digit pattern.

The first group then continued to release the second character, which as the “t” character, which also did not match with the digit pattern. And the first group kept releasing more characters until it ran out of characters.

And finally, there were no characters matched with the digit pattern.

Now for the possessive version: first, the first group also matched and consumed all the characters in the input string. Then the second digit pattern came into action but it found no matches because there was no character left in the input string. And since the first group was possessive, it did not release any characters for the digit pattern to match, the whole matching process stopped right after the digit pattern could not achieve a match.

That’s why the possessive has taken less time to produce the same result as the greedy counterpart.

And as a conclusion, if there is something that can be achieved with the possessive quantifiers, I would recommend this for matching tasks over the greedy ones.

--

--