How to build Regular Expression in Java
In this blog post, I’ll explain in detail how to build a regular expression in Java, as well as provide some sample regular expressions.
Introduction
If we want to represent a group of strings according to a particular pattern, then we should go for a regular expression.
For example, we can write a regular expression to represent all valid email addresses, or we can write a regular expression to validate valid phone numbers, and so on.
The most important application area where we can use regular expressions-
- To develop a validation framework like the Hibernate Validator.
- To develop a pattern matching tool or application like Ctrl + F or grap cmd in Linux.
- To develop digital circuits
- To develop translators like assemblers, compilers,s, and interpreters, etc.
- To develop communication protocol TCP/IP, UDP, etc.
To work with regular expressions in Java, we can utilize the java.util.regex
package, which includes the following classes:
Pattern
— This class is a compilation of regular expressions that can be used to define various types of patterns (define a pattern to be used in a search).Matcher
— This object is used to perform match operations for an input string in Java (Used to search for the pattern).PatternSyntaxException
— This class is used to Indicates syntax error in a regular expression pattern.
Here I’m adding a sample code snippet of a regular expression: —
import java.util.regex.*;
public class RegularExpression {
public static void main(String[] args) {
int count = 0;
Pattern pattern = Pattern.compile("ab");
Matcher matcher = pattern.matcher("abcbcbcababacb");
while (matcher.find()) {
++count;
System.out.println(matcher.group()+ "...... found at: "+matcher.start());
}
System.out.println("The Total number of occurrence is " + count);
}
}
/****
Output-
ab...... found at: 0
ab...... found at: 7
ab...... found at: 9
The Total number of occurrence is 3
****/
Pattern: —
An object is a compiled version of a regular expression, that is, a Java equivalent of a pattern. We can create a pattern object by using the compile()
method of the Pattern
class. The signature of the compile()
method of the Pattern
class is as follows:
Flags — Flags in the compile()
method change how the search is performed. Here are a few of them
CASE_INSENSITIVE
— The case of letters will be ignored when performing a searchUNICODE_CASE
— Use it together with theCASE_INSENSITIVE
flag to also ignore the case of letters outside of the English alphabetLITERAL
— Special characters in the pattern will not have any special meaning and will be treated as ordinary characters when performing a search.
public static Pattern compile(String regex)
Below is an example of the compile()
method of the Pattern
class:
Pattern pattern = Pattern.compile("ab");
Matcher: —
The matcher object can be used to check the specified pattern in the target string. Using the Pattern
class’s matcher()
method, we may generate an object for a matcher. The Pattern
class’s matcher ()
method has the following signature:
public Matcher matcher(CharSequence input)
Here’s an example of the Pattern
class’s matcher()
method: —
Matcher matcher = pattern.matcher("abcbcbcababacb");
The Matcher
class is present in the java.util.regex
package. The following are some of the Matcher
class’s most important methods:
boolean find()
— It attempts to find the next match and returns true if it is available.int start()
— return the start index of matched.int end()
— return the end index of the matched.String group()
— It returns a matched pattern.
Note:-
Pattern
andMatcher
classes are present injava.util.regex
package and introduced injava1.4v
.
Character classes —
[abc] — Either ‘a’, or ‘b’ or ‘c’
[^abc] — Except ‘a’ and ‘b’ and ‘c’
[a-z] — Any Lowercase alphabet symbol from a to z
[A-Z] — Any Uppercase alphabet symbol from A to Z
[a-zA-Z] — Any alphabet symbol
[0–9] — Any Digit from 0–9
[a-zA-Z0–9] — Any alphanumeric symbol
[^a-zA-Z0–9] — Except alphanumeric symbol(special character only)
Predefine Character classes —
. — Any Character Except New Line
\d — Digit (0–9)
\D — Not a Digit (0–9)
\w — Word Character (a-z, A-Z, 0–9,_)
\W — Not a Word Character
\s — Whitespace (space, tab, newline)
\S — Not Whitespace(space, tab, newline)
\b — Word Boundary
\B — Not a Word Boundary
\uxxxx — Unicode charector specidied by the hexadecimal number xxxx
Quantifiers —
*— 0 or More
+ — 1 or More
? — 0 or One
{3} — Exact Number
{3,4}- Range of Number (Minimum, Mazimum)
We can specify the number of occurrences to match using quantifiers.
split()
— To split the target string according to a specific pattern, we can use thePattern
classsplit()
. ThePattern
class’ssplit()
method has the following signature:
public String[] split(CharSequence input)
I’ve included a code snippet of the Pattern class’s split()
method for your convenience:
public class RegularExpression {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("\\s");
String[] splitString = pattern.split("Pattern class is present in java.util.regex");
for (String text : splitString) {
System.out.println(text);
}
}
}
/***
Output -
Pattern
class
is
present
in
java.util.regex
***/
Strings were divided in the above example depending on whitespace(\\s).
split()
— TheString
class also contains thesplit()
method. The string classsplit()
method is used to split the target string according to a particular pattern.
public class RegularExpression {
public static void main(String[] args) {
String text = "This is example of String class split() method";
String[] strings = text.split("\\s");
for (String s : strings) {
System.out.println(s);
}
}
}
/***
Output
This
is
example
of
String
class
split()
method
***/
Strings were divided in the above example depending on whitespace(\\s).
Note: — The
Pattern
classsplit()
method can take a target string as an argument, whereas theString
classsplit()
method can take a pattern as an argument.
StringTokenizer
StringTokenizer
is a class created specifically for tokenization tasks. The java.util
package has a StringTokenizer
class.
public class StringTokenizerDemo {
public static void main(String[] args) {
StringTokenizer tokenizer = new StringTokenizer("StringTokenizer class present in java.util package");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
}
}
/***
Output -
StringTokenizer
class
present
in
java.util
package
***/
Note : — The default regular expression for
StringTokenizer
is whitespace (\\s).
We can also pass regular expressions as per the requirements below. I’m adding code snippets for your better understanding —
StringTokenizer tokenizer = new StringTokenizer("05-21-2023","-");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
/***
Output
05
21
2023
***/
I’ve included some regular expressions that are commonly used in programming: