Java RegEx: Part 5— matches(), lookingAt(), find()
In the previous part, we used regular expressions and the String.matches()
method to check for the validity of the input string with the pre-defined patterns.
The String.matches()
method supports to validate the entire input string. That is not always the case in reality. In many cases, we just want to check the validity or to search for substrings, rather than the whole text. Many such examples can be found in searching, or find and replace features in a text editor such as Microsoft Word, Notepad, and a lot of others.
In order to have more flexible and powerful tools for searching data, checking a substring in a string, confirming the existence of a certain substring before performing matching, and so on, we need to apply the Regular Expression Engine in Java which is located in the package java.util.regex.*
In this lecture, we are going to discuss 2 main classes in the expression engine: Pattern
and Matcher
; and the three foundation methods in the Matcher
class: matches()
, lookingAt()
, and find()
The following table provides a description of each method:
First, let’s see how the Pattern
and Matcher
classes, and the matches()
method are in action:
The program prompts users to input their name and validate the inputted name based on the following pattern:
String namePattern = “[a-zA-Z\\s]+”;
The pattern allows at least of the characters: lower/upper case letters and whitespace characters.
And instead of using the String.matches()
method, I used the matches()
method in the Matcher class.
Pattern pattern;
Matcher matcher;
Before applying the pattern to perform text matching, we need to pass the pattern as a parameter in the compile()
method:
pattern = Pattern.compile(namePattern);
As the name suggests, the compile()
method will check whether the pattern is syntactically valid. This allows the program to save a lot of overhead in matching or searching tasks later on. That’s because if the pattern contains invalid characters defined in the regular expression engine, the compile()
method will throw exceptions and stops the program without further processing.
This is a significant difference from using the String.matches()
method.
Once the pattern is checked and compiled successfully, the inputted string is ready to be matched:
matcher = pattern.matcher(name);
Finally, the matches()
method is called to perform the matching:
flag = matcher.matches();
Note that, by default, the matches()
method will check the entire string against the defined pattern. And it will return true or false accordingly.
The rest of the program is similar to the previous example.
The output should look like this if we run the program:
Enter your name: David 33 Karan
Invalid Input!
Enter your name: David Karan
Valid input
- David 33 Karan: invalid because there were digits 33
- David Karan: completely matched the pattern
Next, let’s see how the lookingAt()
method is applied.
As mentioned earlier, this method will checks if a string starting with a certain substring or characters.
For instance, I want the inputted name must start with the first name as “David” and can be followed by any names. Then, I can apply the lookingAt()
method as follows:
First, I defined the pattern name as “David”
String namePattern = “ David “;
Then, I just needed to change the invocation of matches() method into lookingAt()
method:
flag = matcher.lookingAt();
The lookingAt()
method will check if the inputted name starting with the substring “David”, and returns true or false accordingly.
The output should look like this if we execute the program:
Enter your name: Gorge Karan
Invalid Input!
Enter your name: David Karan
Valid input
- Gorge Karan: invalid because the name started with “Gorge”
- David Karan: valid because the name started with “David”
Next, let’s explore the find()
method.
Let’s study the following code:
In the above program, I have the following text:
String text = “I love you so much! However, I cannot marry you because you are not a human!”;
The task I want to achieve is to find the number of appearances of the word “not
”
Therefore, I need to define the keyword I want to find:
String searchString = “not”;
Then, I use the find()
method to conduct the searching activity in the text.
The find()
method will start the searching at the beginning until it reaches the end of the text, therefore, a while loop needs to be associated to make sure the entire text will be scanned through:
while (matcher.find()) {
count++;}
The find()
method will return true if the word “not
” is found. If it does not, it keeps searching until the end of the text.
Each time the word “not
” is found, we increased the count variable by 1.
If we run the program, we will have the output:
The word was found: 2 times
Instead of using a specific word to be a pattern like the above example, we can of course flexibly define a more general regular expression.
Let’s see the following program:
As you can see, the text in the program contains some digits:
String text = “I love you 34 so much! However, 24 I cannot marry 45 you because 4 you are not a human!”;
And I want to find out if there are any digits in the text or not.
Therefore, I need to define the following simple pattern for searching digits:
String searchString = “\\d”;
And that will trigger the find()
method to search for digits in the text.
And here is the result if we run the program:
Found: 7 times
There were 7 digits found: 3, 4, 2, 4, 4, 5, 4
If we want the find()
method to treat 2 or more digits right next to each other as a single number, which means the result should be 4 numbers: 43, 24, 45, and 4. Then, we just need to apply the quantifier plus (+) sign as we had done previously:
String searchString = “\\d+”;
Run the program and we’ve got the expected result:
Found: 4 times
Finally, if we want to know which digits that the find()
method has found, then we can invoke the group()
method in the Matcher class:
while (matcher.find()) {
count++; System.out.println(“Found: “ + matcher.group());}
Run the program and we got the found numbers:
Found: 34
Found: 24
Found: 45
Found: 4
Found: 4 times