Learn Python Fundamental in 30 Days — Day 16 (Regular Expression Part 1)

Regular expressions can be think like a mini-language for specifying text pattern

re.compile(): To create a regex object
re.search(): find a pattern in a string
re.match(): does this entire string conform to this pattern
re.findall(): find all patterns in this string and returns all the matches in it not just the first match
re.group(): to get the matched string

Searching with Regex

match = re.search(pattern,string)

Pattern type(Character Classes)

\w : sequence of word-like characters [a-zA-Z0–9_] that are not space
\d: Any numeric digit[0–9]
\s: whitespace characters(space,newline,tab)
\D: match characters that are NOT numeric digits
\W: match characters that are NOT words,digit or underscore
\S: match characters that are NOT spaces,tab or newline

Repetition Group

+ : 1 or more
* : 0 or more
?: 0 or 1
{k}: exactly integer K occurence
{m,n}: m to n occurence inclusive
. :matches any character except the newline(\n)
^: start of the string
$: end of string
\: escape character

Example

# Re module has all regular expression function in it
>>> import re
>>> example = “Welcome to the world of Python”
>>> pattern = r’Python’
>>> match = re.search(pattern,example)
>>> print(match)
<_sre.SRE_Match object; span=(24, 30), match=’Python’>
>>> if match:
… print(“found”, match.group())
… else:
… print(“No match found”)
found Python

NOTE: r is for raw string as Regex often uses \ backslashes(\w), so they are often raw strings(r’\d’)

Most popular example is finding phone number :-)

>>> import re
>>> message = “my number is 510–123–4567”
# Here we are creating regex object,which define the pattern we are looking for 
>>> myregex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’)
# Then we are trying to find a pattern in the string
>>> match = myregex.search(message)
# This will tell us the actual text
>>> print(match.group())
510–123–4567

In case we have multiple phone number, use findall

>>> import re
>>> message = “my number is 510–123–4567 and my office number is 510–987–1234”
>>> myregex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’)
# Find all pattern of the string and return a list objects
>>> print(myregex.findall(message))
[‘510–123–4567’, ‘510–987–1234’]

Lets use group to separate area code with phone number. Here parenthesis have special meaning where group start and where group end.

import re
myregex = re.compile(r’(\d\d\d)-(\d\d\d-\d\d\d\d)’)
>>> match = myregex.search(“My number is 510–123–4567”)
>>> match
<_sre.SRE_Match object; span=(13, 25), match=’510–123–4567'>
# This will return the full matching string
>>> match.group()
‘510–123–4567’
# Only return the first matching group(area code)
>>> match.group(1)
‘510’
#Second matching group(Return the whole phone number)
>>> match.group(2)
‘123–4567’

To find out parentheses literally in string, we need to escape parentheses usingbackslash \(

>>> myregex = re.compile(r’\(\d\d\d\)-(\d\d\d-\d\d\d\d)’)
>>> match = myregex.search(“My number is (510)-123–4567”)
>>> match.group()
‘(510)-123–4567’

Pipe Character(|) match one of many possible group

>>> lang = re.compile(r’Pyt(hon|con|mon)’)
>>> match = lang.search(“Python is a wonderful language”)
>>> match.group()
‘Python’
>>> match = lang.search(“Pytcon is a wonderful language”)
>>> match.group()
‘Pytcon’
>>> match = lang.search(“Pytmon is a wonderful language”)
>>> match.group()
‘Pytmon’

If regular expression not able to find that pattern it will return None, to verify that

>>> match = lang.search(“Pytut is a wonderful language”)
>>> match == None
True

So this end of Day16, In case if you are facing any issue, this is the link to Python Slack channel https://devops-myworld.slack.com

Please send me your details

  • First name
  • Last name
  • Email address

to devops.everyday.challenge@gmail.com, so that I will add you to this slack channel

HAPPY CODING!!!