Nerd For Tech
Published in

Nerd For Tech

Regular Expressions cheat sheet

A Regular Expression (or Regex) is a pattern (or filter) that describes a set of strings that matches the pattern.

Regular expressions are used to search strings for a specific pattern or to validate the user-input value like to validate the email address entered by the user.

Colosseum, Rome, Italy

Why we use Regular Expression?

To perform a certain action on a given string based on the characters present in it. Action can be validating the string or it can be changing certain parts of it.

Types of Regular expression:

1. Character classes:

A character class allows you to match any symbol from a certain character set. A character class is also called a character set.

\d for digits (0–9)
. for every character
\w for white spaces
\t for tabs

2. Boundary matches:

Checks the boundary conditions like if a string starts with a given string or ends with a given string.

^ to check the prefix boundary condition
$ to check the suffix boundary condition

Examples:

1. Changing a certain character in a string

We can directly use the character as the regex like here in the above example we used ‘h’ as the regex to replace all the ‘h’ with capital ‘H’.

NOTE: Regular expression is case-sensitive.

2. Changing all the characters of a string

.’ represents all the characters in a string (upper case, lower case, numeric value, white spaces, etc). So here we changed all the characters in the string with X

3. Changing the first few characters of a string.

Here we are using a boundary matcher (^) called carrot. It checks if the string starts with ‘abcDeee’ then it will replace it with ‘YYY’. As we can see the number of characters in the regex need not be equal to the number of characters in the new string with which we are replacing the current one.

4. Checking a string starts with a certain substring

It returns a boolean value that represents the string starts with a given string value.

5. Checking a string ends with a certain substring

Here $ is a character class variable used to check the end of the string.

6. Replacing some characters in a string with a new string

It will replace all the a’s, e’s, and i’s with ‘X’. It’s a kind of OR operator, If it found any of the given characters then replace it with the given string.

7. Replacing some characters only if they are followed by certain characters.

It will check if any character in the first square bracket follows by any of the characters from the second square bracket, if the conditions match then it replaces it with the given string (‘X’).

8. Handling case sensitivity using square bracket

Checks if a string starts with a certain character irrespective of the case.

9. Replacing all the characters except the few characters.

We will use the carrot (^) in the square bracket to achieve that, it is also known as NOT operator.

10. Replacing characters of a range

If we want to replace sequential numbers and alphabets with a given number, then one way is to do that is

But we can write it in a better way like:

11. Handling case sensitivity in range regex

One way to do it is as shown above by using (?i) before the regex. We can use a traditional way as well [a-fA-F1–6].

12. Replacing all the numbers in a string

We already know the one way to do it (using square brackets range value), there is another way to do so, using character class variable (\\d).

We are adding an extra (\) to escape the character.

13. Replacing all the non-numeric characters in a string

(\\D) represents the Negation of (\\d). So it represents all the characters except numbers.

14. Replacing all the numeric, alphabets, and _ in a string

(\\w) represents characters including all the numbers, alphabets, and _.
(\\W) represents all the remaining characters (special characters).

There are many other character class parameter present, for those please refer to:

https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

Quantifier

It specifies how often an element in a regular expression can occur.

15. Writing regex for string having duplicate characters

If we are targeting a string having sequential duplicate characters then we can right that once and can add the occurrence of that characters in the curly brackets.

In the above example, we are replacing a string (Hiii) having 3 sequential i’s. We can do that in 2 ways as shown. Here there is no much difference in the regex but when the string size increases the second option starts looking more readable.

16. Adding condition for one or more characters

+’ is a match one or more operator. It checks if the given character is present at least once.

It will replace all the string that contains 1 H and any number of i.

17. Setting limit on character count

We can set the range using the curly brackets.

As in the example above, if the number of i’s is present in a range between 2–5 (2,3,4,5), then only it will update the string.

18. Character 0 or more time present

For one or more we are using the ‘+’ quantifier similarly, we can use ‘*’ for 0 or more. It is like an optional operator.

Please refer to the Pattern class to learn more about quantifier:

https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

Pattern Matching

19. HTML pattern matching

Let's suppose we have a big HTML code and we want to check how many <h2> tags we have and we want to extract the data present in the tags.

For that, we need to learn about Pattern and Matcher classes.

If we want to find the number of <h2> tags in the following HTML code.

how we can achieve this?

First, we need to check if there are any <h2> tags present in the given code.

NOTE: We can use the matcher object only once, So to reuse it we need to reset the matcher object using matcher.reset() method. As we already used it to check if there are any <h2> tags present, we need to reset it before using it to extract the index where the data is present.

Group Pattern

20. We can use a group pattern to extract the data present in the <h2> tags.

If we see here we are also getting the tags with the data, to omit the tags and to get only the data, we need to change our group pattern. Instead of one group, we will create multiple groups like:

the second group is pointing to the data between the <h2> tags.

Few tasks for practice

TASK 1: Change all the ‘m’ present in the given string with A

String str = “My name is maddy”;

TASK 2: Checks if the given string starts with ‘he’

String str = “Hello World”;

TASK 3: Checks if the given string ends with ‘ld’

String str = “Hello World”;

TASK 4: Change the first 5 characters of the given string with ‘Hi’

String str = “Hello World”;

TASK 5: Change all the vowels with X in the given string

String str = “Hello World”;

TASK 6: Checks if the string contains exact 6 numbers

String str = "712632";

TASK 7: Checks if the given string contains no number

String str = "Hello T0M";

TASK 8: Extract the data present in the <p> tag using pattern and Matchers class

StringBuilder htmlText = new StringBuilder("<h1>My Heading</h1>");
htmlText.append("<h2>Sub-heading</h2>");
htmlText.append("<p>This is a paragraph about something.</p>");
htmlText.append("<p>This is another paragraph about something
else.</p>");
htmlText.append("<h2>Summary</h2>");
htmlText.append("<p>Here is the summary.</p>");

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store