Regular Expressions: Sub() Method and Verbose Mode

A series of tutorials on Regular Expressions using Python

If you’ve stumbled across this article and are new to this series of tutorials on regular expressions, feel free to take a look at the rest of the series (in order):

  1. Regular Expressions: Basics

Substituting Strings with the Sub() Method

Regular expressions can not only find text patterns but can also substitute new text in place of those patterns. The sub() method for regex objects is passed two arguments. The first argument is a string to replace any matches. The second argument is the string for the regular expression. The sub() method returns a string with the substitutions applied.

Let’s take a look at the following example:

Example 1: Using the sub() method

We can see that we’re trying to find all word occurrences that follow the word “Agent”. We used the findall() method and it returned “Agent Zohaib” and “Agent Bob”.

Now maybe we want to redact some words from a string.

namesRegex.sub(‘REDACTED’, ‘Agent Zohaib gave the secret documents to Agent Bob.’)# ‘REDACTED gave the secret documents to REDACTED.’

The word “REDACTED” was substituted for every string that matched the regex pattern within the string.

Example 2: Using Groups when implementing sub() Method

Sometimes you may need to use the matched text itself as part of the substitution. In the first argument to sub(), you can type \1, \2, \3, and so on, to mean “Enter the text of group 1, 2, 3, and so on, in the substitution.”

For example, say you want to censor the names of the secret agents by showing just the first letters of their names. To do this, you could use the regex, Agent (\w)\w* and pass r’\1****’ as the first argument to sub(). The \1 in that string will be replaced by whatever text was matched by group 1 — that is, the (\w) group of the regular expression.

Managing Complex Regular Expression’s using VERBOSE Mode

Regular expressions are fine if the text pattern you need to match is simple. But matching complicated text patterns might require long, convoluted regular expressions. You can mitigate this by telling the re.compile() function to ignore white space and comments inside the regular expression string. This verbose mode can be enabled by passing the variable re.VERBOSE as the second argument to re.compile().

Example 3: Implementing VERBOSE Mode

Take a look at the following:

phoneRegex = re.compile(r’((\d{3}|\(\d{3}\))?(\s|-|\.)?\d{3}(\s|-|\.)\d{4}(\s*(ext|x|ext.)\s*\d{2,5})?)’)

That look’s really messy and it’s hard to read.

We can actually use VERBOSE mode and spread the regex over multiple lines with comments like this:

Now this looks much cleaner and easier to read.

Note how this example uses a triple quote syntax (r’’’) instead of the usual single quote raw string (r’). We use the triple quote syntax so we can spread the regex over many lines, making it more legible.

The comment rules inside the regex string are the same as regular Python code: the # symbol and everything following it are ignored. Moreover, the extra spaces inside the multi-line string for the regex are not considered part of the text pattern to be matched. Overall, it helps organize your regex.

This article concludes the end of this series of tutorials on regular expression’s. I really hope I did a service in putting this series out and there and that you learned something from this.

Stay tuned for future tutorials and give this article a clap if you liked it!