Getting Started with Python re — Regular Expression Operations

Gregory DeSantis
Jun 20, 2019 · 4 min read

re of Fortune

In analogous terms, using re in Python is similar to buying a vowel on the Wheel of Fortune…it makes solving the puzzle much simpler!

What is re?

Although re is the actual module name in Python this is just short for RegEx, or Regular Expression. Simply put, regular expressions are a sequence of characters that forms a search pattern which can be used to check over text. Regular expressions can be an extremely powerful tool as they can assist in filtering data especially in larger datasets and they also significantly reduce the amount of code needed. For example:

Why should we re?

re essentially is a programming language of its own and therefore it is quite extensive. This allows for a great deal of flexibility for your searches but also requires research and practice. Below are links to a book on regular expressions, an online regular expression tester, a YouTube tutorial on the re module, and the documentation of re in Python.

re application is heavily observed in real-life! Although the possibilities are endless, re searches for patterns within texts which can be extremely useful for industries that need to make timely decisions based on information such as finance.

Importing re

To work with re in Python, we must first import the module.

Raw Strings

A raw string in Python is simply a string prefixed with an “r”. This simply tells Python not to handle \ in any special way. For example:

This is important because we want our regular expressions to interpret the strings passed in and not have Python doing anything to them first.

re Functions

The re module offers a set of functions that allows us to search a string for a match.

re.findall — Returns a list containing all matches.

re.search — Returns a match object if there is a match anywhere in the string. Only the first occurrence of the match will be returned.

re.split — Returns a list where the string has been split at each match.

re.sub — Replaces one or many matches with a string.

Special Sequences

A special sequence is a \ followed by characters and has a special meaning like a wild card. For example, when we used the functions split and sub above, \s returns a match where the string contains a white space character.

– Returns a match if the specified characters are at the beginning of the string.

– Returns a match if the specified characters are at the end of the string.

– Returns a match where the string contains digits (numbers from 0-9).

Metacharacters

Metacharacters are characters with a special meaning

– A set of characters

– Starts with

– Ends with

– Either or

Conclusion

In this lesson on Getting Started with Python re — Regular Expression Operations, only 4 functions, 3 special sequences, and 4 metacharacters were introduced and yet we were able to observe the powerful capabilities. If you are working with large data sets regular expressions can be highly effective in returning or manipulating data with a single line of code. I hope this introduction taught you some basic fundamentals and please use PyRegex to practice your coding with re.

re Test

Create a new variable called ripped_money that is a list of the variable hundred_million where each index is a slice of where either a comma or period was. Omit the commas and period from this list. The output should look like: [‘100’, ‘000’, ‘000’, ‘00’]

More From Medium

More from Gregory DeSantis

More from Gregory DeSantis

Decoding the Data Scientist

157

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade