3 Minutes Python | Regex (Part 1)

Jay Shi
One Bit At A Time
Published in
4 min readJan 9, 2020

What is regular expression and how do we use it?

Photo by Evgeni Tcherkasski on Unsplash

What is Regex and why use it

Regex stands for regular expression. It essentially uses a string pattern to search within strings.

The regular expression can be used for the following scenarios:

  1. find a specific string
  2. substitute a specific string
  3. split a string in a certain location
  4. check if a string matches a certain pattern

Since there is a lot to cover for regular expression, I will make two posts for regex. This post will focus on some simple regex, and the next post will focus on how to use it on python.

Simple Regex pattern

Validate 10-digit phone number

Let’s say now you made a form and users need to enter their 10-digit phone number in this format XXXXXXXXXX (e.g 1231231234), how do you make sure users always enter a valid birthday using regular expression? Well, the first method we can do is check if the user enters 10 digits. Depending on the programming language you use, we can first check if all the characters entered are digits and if the length of that is 10. But we can do it in a more elegant way using a regular expression.

\d{10} this can make sure that the string has ten digits. Here, \d represents digits, essentially 0–9, and {10} represents the quantities of the pattern it follows. Since here, {10} follows digit\d , and this means the pattern consists of 10 digits.

Now let’s say we need users to enter the following phone number format XXX-XXX-XXXX . We can use regular expression \d{3}-\d{3}-\d{4} .

Validate email address

Let’s say now we want users to enter an email address. For simplicity, let’s assume users’ email address has the following format XXX...@gmail.com (e.g. dragonfruit@gmail.com). Here, X represents either alphabet or digit.

How do we do that in regular expression?

The answer is [a-zA-Z0-9]+@gmail\.com

Here [a-zA-Z0-9] represents a letter or a digit that’s between a-Z or 0–9. To clarify, [a-z] represents a letter from a to z, [A-Z] represents a letter from A to Z, and [0-9] represents a digit from 0–9. We can combine these three into [a-zA-Z0-9] .

[a-zA-Z0-9]+ means it has one or more [a-zA-Z0-9] . Another example could be \d+ representing one or more digits.

In terms of the Gmail domain part, we need to escape the dot (add backward slash). . is a reserved character in regex, and it needs to be escaped. Other characters that need to be escaped are \+*?^$|[(){ .

Search sentence

Let’s say now we try to find someone from our yellow page, and we know his first name starts with D, and the last name starts with F (e.g Derek Frank)

How do we represent this in regular expression?

^D[a-z]+\sF[a-z]+$

The carrot sign ^ means that the string has to start with its following character, which is D. And the dollar sign $ means that the string has to end with the character it followed, which is a lowercase alphabet in this case. This essentially guarantees that the string it is searching starts with the letter D and ends with another letter.

\s represents a space. We need this since there is a space between the first name and last name.

Recap

To search for a 10-digit phone number, we can use \d{10} . \d represents a digit from 0 to 9, and {10} represents that there are 10 such digits.

To validate Gmail email address, we can use [a-zA-Z0-9]+@gmail\.com . [a-zA-Z0-9] is a combination of [a-z],[A-Z],[0-9] , meaning that it represents a letter or digit from a-Z or 0–9. We also need to escape the following symbols when using it: \.+*?^$|[(){ .

To search for someone’s name in the format {first name starting with D} {last name starting with F} , we can use regex ^D[a-z]+\sF[a-z]+$ . ^ and $ represents start with and ends with correspondingly. \s represents a space.

Other Similar Python Posts:

--

--

Jay Shi
One Bit At A Time

Software Engineer @ Google. I write about Python tutorials and stuff that can help you become a better software engineer.