How to build Regular Expression in Java

TARA RAM GOYAL
5 min readMay 29, 2023

--

In this blog post, I’ll explain in detail how to build a regular expression in Java, as well as provide some sample regular expressions.

Introduction

If we want to represent a group of strings according to a particular pattern, then we should go for a regular expression.

For example, we can write a regular expression to represent all valid email addresses, or we can write a regular expression to validate valid phone numbers, and so on.

The most important application area where we can use regular expressions-

  1. To develop a validation framework like the Hibernate Validator.
  2. To develop a pattern matching tool or application like Ctrl + F or grap cmd in Linux.
  3. To develop digital circuits
  4. To develop translators like assemblers, compilers,s, and interpreters, etc.
  5. To develop communication protocol TCP/IP, UDP, etc.

To work with regular expressions in Java, we can utilize the java.util.regex package, which includes the following classes:

  1. Pattern This class is a compilation of regular expressions that can be used to define various types of patterns (define a pattern to be used in a search).
  2. Matcher This object is used to perform match operations for an input string in Java (Used to search for the pattern).
  3. PatternSyntaxException — This class is used to Indicates syntax error in a regular expression pattern.

Here I’m adding a sample code snippet of a regular expression: —

import java.util.regex.*;

public class RegularExpression {

public static void main(String[] args) {
int count = 0;
Pattern pattern = Pattern.compile("ab");
Matcher matcher = pattern.matcher("abcbcbcababacb");
while (matcher.find()) {
++count;
System.out.println(matcher.group()+ "...... found at: "+matcher.start());
}

System.out.println("The Total number of occurrence is " + count);

}
}

/****
Output-
ab...... found at: 0
ab...... found at: 7
ab...... found at: 9
The Total number of occurrence is 3
****/

Pattern: —

An object is a compiled version of a regular expression, that is, a Java equivalent of a pattern. We can create a pattern object by using the compile() method of the Pattern class. The signature of the compile() method of the Pattern class is as follows:

Flags — Flags in the compile() method change how the search is performed. Here are a few of them

  1. CASE_INSENSITIVE — The case of letters will be ignored when performing a search
  2. UNICODE_CASE— Use it together with the CASE_INSENSITIVE flag to also ignore the case of letters outside of the English alphabet
  3. LITERAL— Special characters in the pattern will not have any special meaning and will be treated as ordinary characters when performing a search.
public static Pattern compile(String regex)  

Below is an example of the compile()method of the Pattern class:

Pattern pattern = Pattern.compile("ab");

Matcher: —

The matcher object can be used to check the specified pattern in the target string. Using the Pattern class’s matcher() method, we may generate an object for a matcher. The Pattern class’s matcher () method has the following signature:

public Matcher matcher(CharSequence input)

Here’s an example of the Pattern class’s matcher() method: —

Matcher matcher = pattern.matcher("abcbcbcababacb");

The Matcher class is present in the java.util.regex package. The following are some of the Matcher class’s most important methods:

  • boolean find() — It attempts to find the next match and returns true if it is available.
  • int start() — return the start index of matched.
  • int end() — return the end index of the matched.
  • String group() — It returns a matched pattern.

Note:- Pattern and Matcher classes are present in java.util.regex package and introduced in java1.4v.

Character classes —

[abc] — Either ‘a’, or ‘b’ or ‘c’
[^abc] — Except ‘a’ and ‘b’ and ‘c’
[a-z] — Any Lowercase alphabet symbol from a to z
[A-Z] — Any Uppercase alphabet symbol from A to Z
[a-zA-Z] — Any alphabet symbol
[0–9] — Any Digit from 0–9
[a-zA-Z0–9] — Any alphanumeric symbol
[^a-zA-Z0–9] — Except alphanumeric symbol(special character only)

Predefine Character classes —

. — Any Character Except New Line
\d — Digit (0–9)
\D — Not a Digit (0–9)
\w — Word Character (a-z, A-Z, 0–9,_)
\W — Not a Word Character
\s — Whitespace (space, tab, newline)
\S — Not Whitespace(space, tab, newline)
\b — Word Boundary
\B — Not a Word Boundary
\uxxxx — Unicode charector specidied by the hexadecimal number xxxx

Quantifiers —

*— 0 or More
+ — 1 or More
? — 0 or One
{3} — Exact Number
{3,4}- Range of Number (Minimum, Mazimum)

We can specify the number of occurrences to match using quantifiers.

  • split() To split the target string according to a specific pattern, we can use the Pattern class split(). The Pattern class’s split() method has the following signature:
public String[] split(CharSequence input)

I’ve included a code snippet of the Pattern class’s split() method for your convenience:

public class RegularExpression {

public static void main(String[] args) {
Pattern pattern = Pattern.compile("\\s");
String[] splitString = pattern.split("Pattern class is present in java.util.regex");
for (String text : splitString) {
System.out.println(text);
}

}
}

/***
Output -
Pattern
class
is
present
in
java.util.regex
***/

Strings were divided in the above example depending on whitespace(\\s).

  • split() The String class also contains the split() method. The string class split() method is used to split the target string according to a particular pattern.
public class RegularExpression {

public static void main(String[] args) {
String text = "This is example of String class split() method";
String[] strings = text.split("\\s");
for (String s : strings) {
System.out.println(s);
}
}
}

/***
Output
This
is
example
of
String
class
split()
method
***/

Strings were divided in the above example depending on whitespace(\\s).

Note: — The Pattern class split() method can take a target string as an argument, whereas the String class split() method can take a pattern as an argument.

StringTokenizer

StringTokenizer is a class created specifically for tokenization tasks. The java.util package has a StringTokenizerclass.

public class StringTokenizerDemo {

public static void main(String[] args) {

StringTokenizer tokenizer = new StringTokenizer("StringTokenizer class present in java.util package");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());

}
}
}

/***
Output -
StringTokenizer
class
present
in
java.util
package
***/

Note : — The default regular expression for StringTokenizer is whitespace (\\s).

We can also pass regular expressions as per the requirements below. I’m adding code snippets for your better understanding —

StringTokenizer tokenizer = new StringTokenizer("05-21-2023","-");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
/***
Output
05
21
2023
***/

I’ve included some regular expressions that are commonly used in programming:

--

--