Java RegEx: Part 7— Boundary Matcher

Sera Ng.
Tech Training Space
3 min readOct 19, 2020

Consider the following cases:

  • We want to search for a sub-string, but only if it appears at the beginning or at the end of the string.
  • We want to search for a meaning word, not a sub-string. For instance, we want to search for the word “is”, not the sub-string “is” in the word “this”.

So far we do not have any technical solutions for the above cases. And this is when the technique called Boundary matcher comes into play.

Boundary matcher defines its own characters, called metacharacters, that you need to be aware of:

There are some other meta characters in Boundary matcher but the above ones are the most commonly used.

Let’s see an example:

Pay attention to the defined pattern:

String searchPattern = “^is$”;

In the pattern, I start with the caret sign (^), followed by “is”, and followed by the dollar sign ($):

  • The caret sign (^) means matching at the beginning, which requires the matched input must start with “is”.
  • The dollar sign ($) means matching at the end, which requires the matched input must end with “is”.

Therefore, the whole pattern means the matched input must be the string “is”. And this is equivalent to the usage of the Matcher.matches() method

Also, note that we need to invoke the find() method to scan and check through the entire input string.

Run the program and we have:

Enter a string to search: this is a string

Not found in the input string!

Enter a string to search: is a string

Not found in the input string!

Enter a string to search: is

Found: is

String “is” was not found in “this is a string” because the inputted string was not started and ended with “is

String “is” was not found in “is a string”, because the inputted string was not ended with “is”, although it was started with “is

String “is” was completely matched

Now, let me change the pattern a little bit:

String searchPattern = “^is[\\s\\w]*”;

In the new pattern, I have replaced the dollar sign ($) with [\\s\\w]*.

That means the searched pattern will match with any string starting with “is”, and can be followed by whitespace characters (\s), or characters from a to z, or digits from 0 to 9.

Run the program and we should have the result like:

Enter a string to search: A string

Not found in the input string!

Enter a string to search: is a string

Found: is a string

String “A string” was not matched because it was not started with “is

String “is a string” was a complete match because it was started with “is”

Let me change the pattern a little bit:

String searchPattern = “[\\s\\w]*is$”;

In this new pattern, I have moved [\\s\\w]* to the beginning, then followed by the string “is”, and ended with the dollar sign ($). Note that, the caret sign (^) has been removed.

So you can guess that the pattern will match with any string ending with “is

Run the program:

Enter a string to search: This is a string

Not found in the input string!

Enter a string to search: This is

Found: This is

String “This is a string” was not a match because it was not ended with “is

String “This is” ended with “is” so it was a complete match

Let me change the pattern another time:

String searchPattern = “\\bis\\b”;

As you can see, in the pattern I have prepended and appended \\b to the string “is”. That means in the input string must contain at least an independent word “is”.

Let’s run the program:

Enter a string to search: This does not have the one

Not found in the input string!

Enter a string to search: This is a string

Found: is

This does not have the one”: there was no match because it did not have any independent word “is”. Note that the sub-string “is” in the word “This” was not counted because in the searched pattern I have had \\b at front and back of the string “is”.

This is a string” was found a match because it contained the word “is”.

Now if I change the search pattern as follows:

String searchPattern = “\\Bis\\b”;

I have replaced the prepended \\b (lower case) with \\B (upper case)

That means the searched pattern will match any words ending with “is”, and can be starting with any characters.

Let’s run the program:

Enter a string to search: This does not have the one

Found: is

This does not have the one”: was found a match because it had the word “This” which ended with “is”.

--

--