Intro to Regex in Python
Created: Yu-Ting Lee, Quert
Tags: python, regex
re.match
match the pattern with the entire string or substring, return the one matched iterator/string/match object. The module is anchored at the beginning of the string.re.match(pattern, string, flags=re.M)
flagre.M
means matching multiple lines.re.findall
find all matched patterns in a stringre.search
search for one matched pattern in a stringre.sub(r'old', 'new', string)
Replacement work in string.re.split
Split string with specific characters or regex.re.compile()
Use to build a new pattern, then we can use thepattern.findall(texts)
- Quantifiers
{times}
: stands for the times we want{n, m}
: n times at least, and m times at most - re.search() and re.match() both have attribute index for found
start()
,end()
\d
: digit\D
: non-digit\w
: word character\W
: non-word character \b
: word boundary
\s
: whitespace, and\t
\b
\f
\S
: non-whitespace+
: one or more*
: zero or more times?
: zero or one time.
: match any characters^
: start of the string$
: end of the string
.+
: anything with any quantities |
: represents "or" //
match "/"
Regex
Code Example for stripping hashtag in text file
Usage code for .
Usage code for ^
and $
OR operator |
, []
, ^
[^]
transforms the expression to negative
Group Characters for further processing
Use ()
Non-Captured group : (?:)
, (?:(a|b)
Numbered Group
Named Group
(?P<name>regex)
Select group number in regex
Look behind & Look Ahead
Positive lookahead (?=) makes sure that first part of the expression is followed by the lookahead expression.
Positive lookbehind (?<=) returns all matches that are preceded by the specified pattern.
Type Conversion for f-string
!s
: string version!r
: string containing a printable representation (i.e. with quotes)!a
: convert to ASCII characters
Format specifiers
:e
scientific notation:.d
digits:.f
float