Regular Expressions: Character Classes & findall() Method

A series of tutorials on Regular Expressions using Python

Zohaib Shahzad
4 min readJul 1, 2020

If you’ve stumbled across this article and are new to this series of tutorials on regular expressions, feel free to take a look at the rest of the series (in order):

  1. Regular Expressions: Basics
  2. Regular Expressions: Grouping & the Pipe Character
  3. Regular Expressions: Repetition & Greedy/Non-Greedy Matching
  4. Regular Expressions: Character Classes & findall() Method
  5. Regular Expressions: Dot-Star and the Caret/Dollar Characters
  6. Regular Expressions: sub() Method and Verbose Mode

Before we dive into character classes for regular expressions, I want to make the distinction between the search() and findall() methods for Python regex.

search(): method finds the first match and returns it as a match object.findall(): method finds all instances that match the regex pattern and returns the matches as a list of strings.

Character Classes

In the previous regex tutorials, we learned that \d could stand for any numeric digits. You could also say that \d is the shorthand for the regex (0|1|2|3|4|5|6|7|8|9). In addition, there are many shorthand character classes shown in the table below:

Character classes are nice for shortening regular expressions. The character class [0–5] will match only the numbers 0–5; this is much shorter than typing (0|1|2|3|4|5). Note that while \d matches digits and \w matches digits, letters, and the underscore, there is no shorthand character class that matches only letters. However, with character classes you can use the [a-zA-Z] which we’ll explore soon.

Enter the following into the interactive shell:

The regex \d+\s\w+ will match text that has one or more numeric digits (\d+), followed by a white space character (\s), followed by one or more letter/digit/underscore characters (\w+). The findall() method returns all matching strings of the regex pattern in a list.

Making Your Own Character Classes

There are times when you want to match a set of characters but the shorthand character classes (\d, \w, \s, and so on) are too broad. You can define your own character class using square brackets.

Square brackets are used to specify character sets. Use a hyphen inside a character set to specify the characters’ range. The order of the character range inside square brackets doesn’t matter. For example, the regular expression [Tt]he means: an uppercase T or lowercase t, followed by the letter h, followed by the letter e.

[Tt]he => The car parked in the garage.

Test the regular expression

A period inside a character set, however, means a literal period. The regular expression ar[.] means: a lowercase character a, followed by letter r, followed by a period . character.

ar[.] => A garage is a good place to park a car.

Test the regular expression

Example 1: Matching any lowercase vowel

vowelRegex = re.compile(r’[aeiou]’) 
# r’(a|e|i|o|u’)

Example 2: Matching letter in the alphabet

vowelRegex = re.compile(r’[a-z]’) # r’(a-z’) 
# all lower case letters

Example 3: Matching both lowercase & uppercase vowels

Another example is the character class: [aeiouAEIOU]. This character class with match any vowel both lowercase and uppercase. Enter the following into the interactive shell:

# Finding vowels in stringvowelRegex = re.compile(r’[aeiouAEIOU]’) # r’(a|e|i|o|u’)vowelRegex.findall(‘Robocopy eats baby food’)# return: [‘o’, ‘o’, ‘o’, ‘e’, ‘a’, ‘a’, ‘o’, ‘o’]

You can also include ranges of letters or numbers by using a hyphen. For example, the character class [a-zA-Z0–9] will match all lowercase letters, uppercase letters, and numbers.

NOTE:

Inside the square brackets, the normal regex symbols are not interpreted as such. This means you do not need to escape the ., ?, *, or () characters with a preceding backslash. For example, the character class [0–5.] will match digits 0 to 5 and a period. You do not need to write it as [0–5\.].

Example 4: Matching Uppercase/Lowercase with Quantifier

doubleVowelRegex = re.compile(r’[aeiouAEIOU]{2}’) 
# two vowels in a row
vowelRegex.findall(‘Robocopy eats baby food’)# return: [‘ea’, ‘oo’]

In the regex above, we’re looking for two vowels (lowercase/uppercase) in a row.

Negative Character Classes

consonantsRegex = re.compile(r’[^aeiouAEIOU]’) 
# match everything not specified in pattern
consonantsRegex.findall(‘Robocop eats baby food.’)# return: [‘R’, ‘b’, ‘c’, ‘p’, ‘ ‘, ‘t’, ‘s’, ‘ ‘, ‘b’, ‘b’, ‘y’, ‘ ‘, ‘f’, ‘d’, ‘.’]

By placing a caret character (^) just after the character class’s opening bracket, you can make a negative character class. A negative character class will match all the characters that are not in the character class.

In the regex above, the pattern will match anything except the values in the character class.

--

--