Regex: Exploring Patterns and Character Classes, Quantifiers, Groups, Look-ahead, and Look-behind

Monica Pérez Nogueras
4 min readJun 12, 2023

Regular expressions, commonly known as regexes, are powerful tools for pattern matching in strings. They provide a concise and flexible way to search, extract, and manipulate text data. In this article, we will delve into the basics of regex, including patterns and character classes, quantifiers, groups, look-ahead, and look-behind.

Introduction to Regex

Regular expressions are written in a condensed formatting language and are used to match patterns in strings. They are widely used in data science applications for tasks such as data cleaning and text processing. Understanding regexes empowers data scientists and programmers to efficiently manipulate text data.

Getting Started with Regex in Python

To work with regular expressions in Python, we need to import the re module, which provides the necessary functions and methods for regex processing. The re module includes functions such as search(), match(), findall(), and split(), allowing us to search for patterns, match patterns at the beginning or anywhere in a string, extract all occurrences of a pattern, and split strings based on patterns.

Let’s take a look at some examples:

import re

# Searching for a pattern
text = "I love regex!"
if re.search("regex", text):
print("Found a match!")
else:
print("No match found.")

# Splitting a string based on a pattern
text = "Hello, World! How are you today?"
print(re.split(",|\s", text))

# Finding all occurrences of a pattern
text = "Python is a powerful language. It is also easy to learn and use."
print(re.findall("is", text))

In the above code, we demonstrate searching for a pattern using search(), splitting a string using split(), and finding all occurrences of a pattern using findall().

Patterns and Character Classes

A fundamental aspect of regex is the ability to define patterns using character classes. Character classes allow us to match specific characters or sets of characters. We can use square brackets (`[]`) to define a character class.

For example:

text = "The fast brown fox jumps over the 5 lazy #dogs."

# Matching lowercase vowels
print(re.findall("[aeiou]", text))

# Matching uppercase consonants
print(re.findall("[ABCDFGHJKLMNPQRSTVWXYZ]", text))

# Matching both uppercase and lowercase letters
print(re.findall("[a-zA-Z]", text))

# Matching digits
print(re.findall("[0-9]", text))

# Matching non-alphanumeric characters
print(re.findall("[^a-zA-Z0-9]", text))

In the above code, we showcase different uses of character classes. We demonstrate matching lowercase vowels, uppercase consonants, both uppercase and lowercase letters, digits, and non-alphanumeric characters.

Quantifiers

Quantifiers in regex specify the number of times a pattern should be matched. They allow us to match patterns repetitively. The most basic quantifier is {m,n}, where m is the minimum number of matches, and n is the maximum number of matches.

import re

# Match 3 to 5 consecutive digits
pattern = r'\d{3,5}'

text = 'The code is 12345. Please enter the code.'

matches = re.findall(pattern, text)

print(matches) # Output: ['12345']

Groups

Groups in regex allow us to treat multiple characters as a single unit. They are defined by enclosing the pattern within parentheses ().

text = "Hello, my name is Jack Black. I am 25 years old."

# Extracting the name
print(re.search("name is ([A-Za-z ]+)", text).group(1))

# Extracting the age
print(re.search("am (\d+) years old", text).group(1))

In the above code, we use groups to extract specific information from a text. We showcase extracting the name and age from a given text.

Look-ahead and Look-behind

Look-ahead and look-behind assertions in regex allow us to specify conditions before or after a pattern without including them in the match. Look-ahead uses (?=pattern) syntax for positive assertions and (?!pattern) for negative assertions. Look-behind uses (?<=pattern) syntax for positive assertions and (?<!pattern) for negative assertions.

text = "Hello, World! How are you today?"

# Matching "World" preceded by "Hello, "
print(re.search("(?<=Hello, )World", text))

# Matching "World" followed by "!"
print(re.search("World(?=!)", text))

In the above code, we demonstrate using look-ahead and look-behind assertions to match patterns. We showcase matching “World” preceded by “Hello, “ and matching “World” followed by “!”.

Conclusions

Regular expressions are a powerful tool for pattern matching and text manipulation. Understanding the basics of regex, including patterns and character classes, quantifiers, groups, look-ahead, and look-behind, provides data scientists and programmers with a versatile way to process and extract information from text data. With practice, you can leverage regex to efficiently handle various text-processing tasks in your projects.

If you are interested in this matter, you can consult the article 15 Examples for Text Processing using Regex.

Let’s connect on Linkedin!!

--

--

Monica Pérez Nogueras

Automation Developer | Data Analyst | Business Intelligence Analyst | The Dow Chemical Company