Beginner’s Guide to Java Regular Expressions (Regex)

Alexander Obregon
8 min readMar 23, 2024

--

Image Source

Introduction

Regular expressions (regex) are a powerful tool for processing text. They allow you to search, manipulate, and edit strings based on specific patterns. In Java, the java.util.regex package provides classes for matching character sequences against patterns specified by regular expressions. This beginner's guide will introduce you to Java regular expressions, teaching you how to use them through simple, easy-to-understand explanations and plenty of code examples.

Understanding Regex in Java

Java provides the java.util.regex package, which contains classes like Pattern and Matcher to perform regex operations. The Pattern class is used to define a pattern (the regex itself), while the Matcher class is used to search for the pattern within a string.

Before diving into the examples, let’s understand some basic regex components:

  • Literals: These are the simplest form of pattern matching. For instance, the regex dog matches the string "dog".
  • Character Classes: Denoted by square brackets [], they match any one of the characters contained within them. For example, [abc] matches "a", "b", or "c".
  • Predefined Character Classes: Java regex offers predefined character classes like \d for digits, \s for whitespace, and \w for word characters (letters, digits, and underscores).
  • Quantifiers: Specify the number of occurrences to match. For example, + means one or more times, * means zero or more times, and ? means zero or one time.

Compiling Patterns and Finding Matches

To use regular expressions in Java, you first compile the pattern using Pattern.compile(), then create a matcher for your input string with that pattern, and finally, use the matcher to find matches.

Here’s a simple example:

import java.util.regex.*;

public class RegexExample {
public static void main(String[] args) {
String text = "The quick brown fox jumps over the lazy dog.";
String patternString = "quick";

Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

boolean matches = matcher.find();
System.out.println("Does the text contain 'quick'? " + matches);
}
}

This code checks whether the word “quick” is present in the given text, printing true if it is.

Grouping and Capturing

Groups are created in regex by enclosing part of the regex in parentheses (). This is not only useful for applying quantifiers to part of the regex but also for extracting information from strings. For example, (\d\d) matches a two-digit number and captures it as a group.

Let’s modify the previous example to find and extract the first word from the text:

import java.util.regex.*;

public class RegexExample {
public static void main(String[] args) {
String text = "The quick brown fox jumps over the lazy dog.";
String patternString = "(\\w+)"; // Matches and captures the first word

Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

if (matcher.find()) {
System.out.println("The first word is: " + matcher.group(1)); // Outputs "The"
}
}
}

Character Classes and Quantifiers in Detail

Expanding on character classes and quantifiers, you can create more complex patterns. For example, the regex [a-zA-Z]+ matches one or more letters of any case. This flexibility allows you to tailor your search pattern to the precise requirements of your application.

Consider searching for any word followed by a digit:

import java.util.regex.*;

public class RegexExample {
public static void main(String[] args) {
String text = "The quick brown fox jumps over 13 lazy dogs.";
String patternString = "(\\w+)\\s(\\d+)"; // Matches a word followed by a digit

Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
System.out.println("Found: " + matcher.group(1) + " followed by number " + matcher.group(2));
}
}
}

This code identifies and extracts words followed by numbers, showcasing the power of using groups and quantifiers together.

Basic Patterns and Matching

When it comes to Java regular expressions, mastering basic patterns and understanding how to match them against strings are fundamental skills. Here we will go into common patterns and demonstrates how to use them for matching operations.

Commonly Used Patterns

  1. Digits: The \d pattern matches any digit character. To match a specific number of digits (e.g., a 5-digit zip code), you can use \d{5}.
  2. Word Characters: The \w pattern matches any word character (letters, digits, and underscores). For example, \w+ matches one or more word characters.
  3. Whitespace: The \s pattern matches any whitespace character (spaces, tabs, and line breaks). To find sequences of one or more whitespace characters, use \s+.
  4. Literal Characters: Sometimes, you need to match specific words or characters exactly as they appear. For instance, cat matches the string "cat" exactly.
  5. Wildcard Character: The . (dot) pattern matches any single character except newline characters. It's often used in patterns where you want to match 'any character here'.

Matching Digits

Let’s start with a simple example to find and extract all 3-digit numbers from a string.

import java.util.regex.*;

public class DigitsExample {
public static void main(String[] args) {
String text = "There are 123 apples, 456 oranges, and 78 bananas.";
String patternString = "\\b\\d{3}\\b"; // \b is a word boundary

Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
System.out.println("Found a 3-digit number: " + matcher.group());
}
}
}

Finding Words

To match specific words within a string, simply use the word as the pattern. If you want to match a word regardless of its case (e.g., “Java” or “java”), you can use the Pattern.CASE_INSENSITIVE flag.

import java.util.regex.*;

public class WordExample {
public static void main(String[] args) {
String text = "Java is fun. java is powerful. JAVA is everywhere.";
String patternString = "java";

Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
System.out.println("Found 'java': " + matcher.group());
}
}
}

Using the Wildcard for Matching

The wildcard character . is incredibly useful for matching any character in a given position. For example, to find any three-letter word where the middle character can be anything, you can use the pattern \b.\w.\b.

import java.util.regex.*;

public class WildcardExample {
public static void main(String[] args) {
String text = "Can man fan ran pan";
String patternString = "\\b[a-z]an\\b"; // Matches any three-letter word ending with 'an'

Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
System.out.println("Match found: " + matcher.group());
}
}
}

These examples show the power and flexibility of regular expressions in Java for matching patterns within strings. By understanding and applying these basic patterns, you can begin to harness the full potential of regex in your Java applications.

Advanced Pattern Matching

Advanced pattern matching in Java regular expressions introduces more complex concepts such as lookahead and lookbehind assertions, non-capturing groups, and backreferences. These features enable intricate matching scenarios that go beyond basic pattern matching capabilities.

Lookahead and Lookbehind Assertions

Lookahead and lookbehind assertions allow you to include or exclude certain patterns based on what comes before (lookbehind) or after (lookahead) your match without including those patterns in the match itself.

  • Positive Lookahead (?=pattern): Matches a group after the main expression without including it in the result.
  • Negative Lookahead (?!pattern): Specifies a group that should not follow the main expression.
  • Positive Lookbehind (?<=pattern): Matches a group before the main expression without including it in the result.
  • Negative Lookbehind (?<!pattern): Specifies a group that should not precede the main expression.

Example: Excluding Specific Words

Suppose you want to find occurrences of “cat” not followed by “s”.

import java.util.regex.*;

public class LookaheadExample {
public static void main(String[] args) {
String text = "cats scatter for the cat.";
String patternString = "cat(?!s)";

Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
System.out.println("Found 'cat' not followed by 's': " + matcher.group());
}
}
}

Here we search for occurrences of the word “cat” that are not immediately followed by an “s”. The regex cat(?!s) uses a negative lookahead assertion (?!s) to make sure that "cat" is matched only when it is not followed by "s". This way, the match is successful when "cat" is found alone or before characters other than "s", but it ignores "cats". This technique is particularly useful for searching patterns where a specific condition must be met in the succeeding characters without including those characters in the match.

Non-Capturing Groups

Non-capturing groups (?:pattern) allow you to group parts of your regex pattern without storing the matched text. They're useful for applying quantifiers to a portion of your regex or for structuring your regex without the overhead of capturing.

Example: Grouping Without Capturing

To match dates in the format “dd-mm-yyyy” without capturing the separators:

import java.util.regex.*;

public class NonCapturingGroupExample {
public static void main(String[] args) {
String text = "Today's date is 22-03-2024.";
String patternString = "\\b(?:\\d{2}-){2}\\d{4}\\b";

Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
System.out.println("Found a date: " + matcher.group());
}
}
}

This example demonstrates how to match a date pattern without capturing the separators (dashes) in the match result. The regex \\b(?:\\d{2}-){2}\\d{4}\\b contains a non-capturing group (?:\\d{2}-) that matches two digits followed by a dash, exactly twice {2}, and then four digits \\d{4} representing the year. The non-capturing group is used here to apply a quantifier to a part of the pattern (the day and month components) without saving the matched separators as separate capture groups. This technique streamlines the matching process when the separators are not needed for further processing.

Backreferences

Backreferences \n allow you to match the same text as previously matched by a capturing group. They are useful for finding repeated words or patterns.

Example: Matching Repeated Words

To find immediate repeated words in a string:

import java.util.regex.*;

public class BackreferenceExample {
public static void main(String[] args) {
String text = "This is is a test test sentence.";
String patternString = "\\b(\\w+) \\1\\b";

Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
System.out.println("Found repeated words: " + matcher.group());
}
}
}

In this example, we find immediate repeated words in a text. The regex \\b(\\w+) \\1\\b uses a capturing group (\\w+) to match and remember a word, followed by a space and a backreference \\1 that matches the exact same text as captured by the first group. This pattern effectively identifies instances where a word is followed by itself with a space in between. Backreferences are powerful tools for identifying patterns that involve repetition or mirroring within the text, enabling more complex search and replace operations.

Conclusion

In this guide, we’ve gone on a journey through the fundamentals and advanced aspects of Java Regular Expressions. Starting with the basics, we explored how to compile patterns, find matches, and utilize character classes and quantifiers. Moving into more complex territory, we touched on advanced pattern matching techniques such as lookahead/lookbehind assertions, non-capturing groups, and backreferences. These concepts not only help your ability to manipulate and analyze strings in Java but also open up a vast array of possibilities for efficient data processing and validation. Whether you’re a beginner or looking to refine your regex skills, the versatility and power of Java Regular Expressions are invaluable tools in your programming arsenal.

  1. Oracle’s Official Java Documentation — Regular Expressions
  2. W3schools — Java Regular Expressions

--

--

Alexander Obregon

Software Engineer, fervent coder & writer. Devoted to learning & assisting others. Connect on LinkedIn: https://www.linkedin.com/in/alexander-obregon-97849b229/