Java RegEx: Part 1 — Introduction

Sera Ng.
Tech Training Space
6 min readOct 17, 2020

A regular expression is a string pattern that can be used to search, find, or extract a text from a string.

Here are some of the usage of regular expressions in reality:

  • Checking for the validity of the phone number format.
  • Checking for the validity of an email address format.
  • Searching if a string containing some numbers, or containing special characters.
  • Validating password strength: must contain at least one upper case letter, at least one special character, and at least one number.

And many other cases can be used regular expression.

Nowadays, there are many programming languages that support regular expressions such as Java, C#, PHP, and JavaScript.

And although regular expression engine in each language might be implemented slightly differently, most of the basic usage is the same in all supporting languages

The regular expression defines some rules that you need to know in order to use it.

Here are some common matching symbols:

Character classes:

Quantifiers

Quantifiers are used to specify the number of appearances of characters in a pattern.

Matching text for validation

Note: All of the following code is tested in JDK 8

Now we are going to use some simple regular expressions to validate users’ inputs:

  • Check if users input an integer
  • Check if users input an integer with a fixed number of digits. For instance, the user must input a full 4 digits such as 2018
  • Check if users input a minimum and maximum numbers of digits. For instance, users must input at least 2 digits, max 10 digits.
  • Check if users input a string starting with characters or digits. For instance, users must input string as ISBN-123–1234
  • Check if users input a string containing special characters such as !, #, $, and %.

Case 1: Checking integer number

This is a sample code that we use to check if the user’s input is a positive integer

In the code, I use a do..while loop to ask the user to input a value. The reason is that if users input an invalid value (not an integer), then we can ask users to re-input without re-running the program again. That’s why we need a Boolean flag variable to keep looping or to exit the loop in case the input is valid.

I use the java.util.Scanner class to get input from users. There are many ways to get input from a console, but the Scanner is much more convenient.

I use the following pattern to check for an integer:

String digit = “\\d”;

Note that, you need to use double \\. That’s because \d is regular expression required syntax, but backslash (\) is a special character (escape character) in Java, therefore, we need to use double backslashes (\\) to define a single backslash to avoid (or escape) compiling errors.

In a regular expression, \d represents for one digit from 0 to 9. Therefore, this simple pattern is enough to check for a valid integer.

Once users input value, the next() method is invoked to get the inputted value and store it in the input variable.

Note that, we should not use the nextInt() method to get the inputted value because if users enter a character or string, then there will be exceptions to be thrown that might cause the program to be crashed.

The String class in Java provides a method called matches(). This method receives a parameter as a pattern to check against the inputted string and returns true or false accordingly.

flag = input.matches(digit);

The true/false result returned by the matches() will be stored in the flag variable, which will then be used to determine whether the while loop should be repeated.

Pay attention that if the matches() method returns false; which means users have inputted invalid values, the flag variable will be false. But since we want to prompt users to input again, we need to reverse the flag value into true by using the NOT (!) operator in order to keep the do..while() loop to be started over. That’s because as you may have known that do..while (and other loops) can only keep running if the condition is true.

Now it’s time to run the program:

Input an integer: a

You must enter a number!

Input an integer: abc

You must enter a number!

Input an integer: 1a

You must enter a number!

Input an integer: 3

Valid data

Run the program again:

Input an integer: -1

You must enter a number!

Input an integer: 12

You must enter a number!

Input an integer:

As you can see from the output, if users inputted a negative number (-1), that was invalid because \d can only accept a positive integer from 0 to 9.

Case 2: Checking a fixed number of digits

In this case, I also want to prompt users to input an integer but with a fixed number of digits.

For instance, I want to get inputs as a 4-digit year such as 2017, 2018, and so on..

I can use the following code to achieve the task:

The code structure is similar to the previous one. I’ve just changed the pattern:

String yearPattern = “\\d{4}”;

Note that I have placed the number 4 in braces ({}) right after \\d characters with no white spaces. The number 4 here means that users must input exactly 4 digits, no more, no less.

Let’s run and check:

Input a year [4 digits]: 12

Invalid data!

Input a year [4 digits]: 123

Invalid data!

Input a year [4 digits]: fgd

Invalid data!

Input a year [4 digits]: 2015

Valid data

From the outputs:

12: this is invalid because there were only 2 digits

123: this is invalid because there are only 3 digits

“fgd”: this is obvious invalid because it was not an integer number

2015: this is valid because the number contained exactly 4 digits

Case 3: Checking an integer with min and max number of digits

In this case, we want to flexibly allow users to input an integer with min and max number of digits.

For instance, we want to ask for users’ ages which is at least 10 years old and a maximum of 100 years old.

I have defined a pattern as follows:

String agePattern = “\\d{2,3}”;

In the braces, 2 is the min and 3 is the maximum number of allowed digits, separated by a comma (,) with no white spaces in between.

Let’s run the program:

Input your age: 1

Invalid data!

Input your age: 1001

Invalid data!

Input your age: 33

Valid data

From the outputs:

1: invalid because the pattern requires at least 2 digits

1001: invalid because the pattern allows a maximum of 3 digits

33: valid because it matches the defined pattern

Case 4: Checking a string starting with certain characters

In this case, we will ask users to input a string pattern starting with certain characters followed by certain digits.

Let’s pick ISBN as an example.

Suppose we want users to input a book ISBN with the following pattern:

  • Starting with ISBN, all upper case
  • Followed by a dash (-) character
  • Followed by 5 digits

Some examples: ISBN-12345, ISBN-98765

We can use the following code to achieve the task:

This is the pattern we need:

String isbnPattern = “ISBN-\\d{5}”;

We start the pattern with upper case letters ISBN, which means users need to provide exactly those upper case letters. Then a dash (-) character needs to be inputted. Finally, 5 digits are required by \\d.

We can run and check the results:

Input ISBN: ISBN12345

Invalid data!

Input ISBN: isbn-12345

Invalid data!

Input ISBN: ISBN-12345

Valid data

From the output

ISBN12345: invalid because there was no dash (-) character

isbn-12345: invalid because isbn is all lowercased

ISBN-12345: valid because it matched the pattern

Case 5: Checking a string with no special characters such as !, @, # $,….

It is very common that in input validation, we need to eliminate a string containing special characters for security reasons. Those cases can be found in validating user name in account registering features.

For instance:

Valid user names would be: user1234, user9adj

Invalid user names: user@!123

Let’s take a look at the following code:

To achieve the required task, I have defined a simple pattern as follows:

String usernamePattern = “\\w+”;

The character \\w represents for letters a-z, A-Z, digits 0–9, and underscore (_). That’s because underscores are allowed in most of the new user name registration forms.

If in some cases, you do not want to include underscores, you can apply the following pattern:

[a-zA-Z0–9]

Back to our pattern, right after \\w is the plus (+) sign which means users must input at least one character in the predefined pattern.

Let’s test our program:

Input user name: user@

Invalid data!

Input user name: 123#user

Invalid data!

Input user name: user1234

Valid data

From the outputs:

user@: invalid because it contained the @ character

123#user: invalid because it contained the # character

user1234: completely matched with the patter

--

--