Learn Python Fundamental in 30 Days — Day 17 (Regular Expression Part 2)

? : zero or one time

>>> import re
# Here ho is optional it might occur zero time or one time
>>> myexpr = re.compile(r’Pyt(ho)?n’)
>>> match = myexpr.search(“Python a wonderful language”)
>>> match.group()
‘Python’
>>> match = myexpr.search(“Pytn a wonderful language”)
>>> match.group()
‘Pytn’

So if we try to match this expression it will fail

>>> match = myexpr.search(“Pythohon a wonderful language”)
>>> match.group()
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
AttributeError: ‘NoneType’ object has no attribute ‘group’
>>> match ==None
True

Same way as with our previous example of Phone Number we can make area code optional

>>> myphone = re.compile(r’(\d\d\d-)?\d\d\d-\d\d\d\d’)
>>> match = myphone.search(“My phone number is 123–4567”)
>>> match.group()
‘123–4567’

“*” zero or more time

>>> import re
>>> myexpr = re.compile(r’Pyth(on)*’)
>>> match = myexpr.search(“Welcome to the world of Pythononon”)
>>> match.group()
‘Pythononon’

“+” must appear atleast 1 or more time

>>> myexpr = re.compile(r’Pyth(on)+’)
>>> match = myexpr.search(“Welcome to the world of Pyth”)
>>> match.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>> match = myexpr.search(“Welcome to the world of Python”)
>>> match.group()
‘Python’
>>> match = myexpr.search(“Welcome to the world of Pythonononon”)
>>> match.group()
‘Pythonononon’

Now if we want to match specific number of times

>>> myregex = re.compile(r’(Re){3}’)
>>> match = myregex.search(“My matching string is ReReRe”)
>>> match.group()
‘ReReRe’
# Range of repetitions
>>> myregex = re.compile(r'(Re){3,5}')
>>> match = myregex.search("My matching string is ReReReRe")
>>> match.group()
'ReReReRe'

Regular expression in Python do greedy matches i.e it try to match longest possible string

# Instead of searching for min i.e first 3 it matches first 5
>>> mydigit = re.compile(r’(\d){3,5}’)
>>> match = mydigit.search(‘123456789’)
>>> match.group()
‘12345’

To do a non-greedy match add ? (then it matches shortest string possible),Putting a question mark after the curly braces makes it to do a non-greedy match

>>> mydigit = re.compile(r’(\d){3,5}?’)
>>> match = mydigit.search(‘123456789’)
>>> match.group()
‘123’

Let’s take a look at few more example which involves character classes

\w : sequence of word-like characters [a-zA-Z0–9_] that are not space
\d: Any numeric digit[0–9]
\s: whitespace characters(space,newline,tab)

Let say I need to match this address

>>> import re
>>> address = “123 fremont street”
>>> match = re.compile(r’\d+\s\w+\s\w+’)
>>>match.findall( match.finditer( match.flags match.fullmatch(
>>> match.findall(address)
[‘123 fremont street’]

We can create our own character class

#Let's create our own character class which matches all lower case vowel
>>> myregex = re.compile(r’[aeiou]’) #To match even upper case
r'[aeiouAEIOU]'
>>> mypat = “Welcome to the world of Python”
>>> myregex.findall(mypat)
[‘e’, ‘o’, ‘e’, ‘o’, ‘e’, ‘o’, ‘o’, ‘o’]

Now if we want to match two vowel in a row

>>> myregex = re.compile(r’[aeiouAEIOU]{2}’)
>>> mypat = “Welcome to the world of Python ae”
>>> myregex.findall(mypat)
[‘ae’]

Negative Character Class(Use of ^ means search everything except vowel)

>>> myregex = re.compile(r’[^aeiouAEIOU]’)
>>> mypat = “Welcome to the world of Python ae”
>>> myregex.findall(mypat)
[‘W’, ‘l’, ‘c’, ‘m’, ‘ ‘, ‘t’, ‘ ‘, ‘t’, ‘h’, ‘ ‘, ‘w’, ‘r’, ‘l’, ‘d’, ‘ ‘, ‘f’, ‘ ‘, ‘P’, ‘y’, ‘t’, ‘h’, ’n’, ‘ ‘]

Let take look at dot (. :matches any character except the newline(\n))

>>> myregex = re.compile(r’.x’)
>>> mypat = “Linux Unix Minix”
>>> myregex.findall(mypat)
[‘ux’, ‘ix’, ‘ix’]

Dot is majorly used with *

* : 0 or more

Now if we change our regex to include both

>>> myregex = re.compile(r’.*x’)
>>> mypat = “Linux Unix Minix”
>>> myregex.findall(mypat)
[‘Linux Unix Minix’]

NOTE

.*: always perform greedy match(except newline)
.*?: To make it non-greedy add ?

Let take a look at this with the help of this example

>>> mystr = ‘“Welcome to the world of Python” great language to learn”’
>>> mypat = re.compile(r’”(.*?)”’)
#Because of non-greedy nature it will search till first " is encountered
>>> mypat.findall(mystr)
[‘Welcome to the world of Python’]

But in case of greedy match

>>> mypat = re.compile(r’”(.*)”’)
# It will return the whole string
>>> mypat.findall(mystr)
[‘Welcome to the world of Python” great language to learn’]

Now as we mentioned above .* matches everything except newline

>>> myexpr = “Welcome to the \n world of \n Python”
>>> print(myexpr)
Welcome to the
world of
Python
>>> mypat = re.compile(r’(.*)’)
>>> mypat.search(myexpr)
<_sre.SRE_Match object; span=(0, 15), match=’Welcome to the ‘>

Now even in this case if we want to perform a greedy match addre.DOTALL(then it will match newlines as well)

>>> mypat = re.compile(r’.*’,re.DOTALL)
>>> mypat.search(myexpr)
<_sre.SRE_Match object; span=(0, 34), match=’Welcome to the \n world of \n Python’>

Second argument is really useful, specially if we want to perform case-insensitive search(re.I)

>>> import re
>>> mystr = “Why Linux Is Such An Awesome Platform”
>>> mypat = re.compile(r’[aeiou]’,re.I)
>>> mypat.findall(mystr)
[‘i’, ‘u’, ‘I’, ‘u’, ‘A’, ‘A’, ‘e’, ‘o’, ‘e’, ‘a’, ‘o’]

So this end of Day17, In case if you are facing any issue, this is the link to Python Slack channel https://devops-myworld.slack.com

Please send me your details

  • First name
  • Last name
  • Email address

to devops.everyday.challenge@gmail.com, so that I will add you to this slack channel

HAPPY CODING!!!