Python RegEx

2 ways to find a match

Sheng Zhuang
Analytics Vidhya
4 min readNov 21, 2019

--

RegEx, or Regular Expression is a mini language, using a string pattern, to search for a substring or substrings in a string.

Photo by Praewthida K on Unsplash

After importing re module, there are four methods we can use to make queries.

  • match()
  • search()
  • findall()
  • finditer()

All these four methods can be called in two ways, called in the module level, or called from the compiled pattern objects:

  • Module-Level Functions
  • compiling/compilation

The differences of these two are in two aspects:

  • how to define the pattern
  • how to call the search() function

Module-Level Functions

Photo by Bernd Klutsch on Unsplash

Take the search() method for example, we can call it directly from the re module: re.search()

First argument is the pattern, this can be literal pattern, or a compiled pattern. Second argument is the tring to be searched for. Of cause these two can be replaced with variables like the example above.

Compilation

Photo by Marc Zimmer on Unsplash

Compiling regex will require making a pattern first, with the compile() methods. Then use the pattern to call the search() for example.

One of the pros of using the compile() method to create the pattern, is that one flag or more can be provided to refine the search process, which is the second arguments.

  • re.A / re.ASCII
  • re.S / re.DOTALL
  • re.I / re.IGNORECASE
  • re.L / re.LOCALE
  • re.M / re.MULTILINE
  • re.X / re.VERBOSE

Check the Python Docs for more definition of these flags

You can set multiple flags with pipe: re.I|re.X, to ignore case and set verbose/comments.

Personally, I would prefer using the compiling method, as I always forget the two arguments position when I use re.search(), and I like set the pattern first anyway, so more conveniently using the pattern to call the search(0 method.

some other notes for regex

Photo by Kyle Glenn on Unsplash

About match(), search(), findall(), finditer()

  • match() and search() will return one match, while findall() and finditer() will return all matches.
  • match() and search() will return a re.Match object, while findall() return a list and finditer() return iterable.
  • match() will only find the match at the start of the string, while search will search through the whole string.

Use raw string

As mentioned at the beginning, regex is a mini language inside python, python would interpret the string a little bit differently as in regex, it would be better to use the raw string to create the pattern.

One special character “?”

  • with or without it
  • match the subpattern but don’t catch it. ((?:pattern)pattern)
  • name the match substring. (?P<name>pattern)

ps: the ? is followed by a capital P

  • look ahead assertion. have it or not have it

methods for re.Match object

  • .group(), .group(i), .groups(), .groupdict()

.group() == .group(0)

  • .span(), .start(), .end()

That’s all for today.

--

--