Regular Expression

Match/Replace/Extract with grep, perl, sed,awk, python

Vince

Published in

vswe

3 min readSep 13, 2020

General Rule

Text content

vince1234
Vince1234
Vince
Gi!
5566

Match one character . [] [^]

Any character
$REGEXP = .
Result: ALLAny character in the []
$REGEXP = [v5]
Result: vince1234, 5566Any character not in the []，注意在[]裡的^意義不同
$REGEXP = [^v]ince
Result: Vince1234, Vince

Anchors Position (Head/End) ^$

Start with V
$REGEXP = ^V
Result: Vince1234, VinceEnd with e
$REGEXP = e$
Result: Vincei at the begin or end (not including the punctuation marks)
$REGEXP = i\b
Result: Gi!i is not at the begin or end    
$REGEXP = i\B
Result: vince1234, Vince1234, Vince

Count []{} * +

Character in the [] repeats 2 times
$REGEXP = [5]{2}
Result: 5566
Note {1, 2} means repeat 1~2 timeRepeat 0~n times
$REGEXP = [5]*
Result: ALLRepeat 1~n times
$REGEXP = [5]+
Result: 5566

Numbers or letters

\d = [0–9]
\D = [^0-9]
\w = [a-zA-Z0–9_]
\W = [^a-zA-Z0–9_]
\s = [ \t\n]
\S = [^ \t\n]Any character isn't a number
$REGEXP = \D
Result: ALL expect 5566String without number
$REGEXP = ^\D+$    or   ^[^0-9]+$
Result: Vince Gi!Number only
$REGEXP = ^[^a-zA-Z]+$
Result: 5566

Match text

with grep

grep -E REGEXP -r ./
grep -E REGEXP filename

with perl

perl -ne 'print if m/REGEXP/' < filename
    -n: assume "while () { ... }" loop around program
    -e: one line of program

with vim

/REGEXP

with python

re.findall Return all non-overlapping matches of pattern in string>>> import re
>>> re.findall(r'^1.*4$', '1454654564')
['1454654564']

Replace

with sed

sed -E 's/before/after/'

with vim

:1,$s/before/after/gic

Group & Extract Substring

() 擷取出 substring，最後從 group 的結果中取得這些 substring
.  代表任意字元
+  前面的字符必須出現一次或多次
?  原本的意思是：前面緊接的那個字有出現一次則匹配，沒出現也是匹配
+? 前面如果接 + 或 * 代表使用 從預設貪婪改成非貪婪的方式尋找貪婪代表所有可能的匹配結果中，取字元數最多的
非貪婪就是取字元數最少的目標字串: aaa123bbbaaa456bbbaaa(.+)bbb
Match 1
Full match: aaa123bbbaaa456bbb
Group 1: 123bbbaaa456aaa(.+?)bbb
Match 1: 
Full match: aaa123bbb
Group 123Match 2
Full match: aaa456bbb
Group 456

with python

>>> import re
greedy()
>>> re.findall(r'cl/([0-9]*)', 'cl/5566')
['5566']>>> re.findall(r'cl/([^ ]*)', 'cl/5566cl/123')
['5566cl/123']non-greedy(?)
>>> re.findall(r'cl/([0-9]*?)', 'cl/5566cl/123')
['', '']>>> re.findall(r'cl/([0-9]+?)', 'cl/5566cl/123')
['5', '1']>>> re.findall(r'cl/([0-9]{3}?)', 'cl/5566cl/123')
['556', '123']

with sed

$echo '123 cl/5566 789' | sed 's/.*cl\/\([^ ]*\).*/\1/'
5566$echo 'vince123' | sed 's/\([a-z]*\).*/\1/'
vince$echo 'xxxx cl/123abc xxx' \
| sed 's/.*cl\/\([0-9]*\)\([a-z]*\).*/\2/ \1'
abc 123\1 就代表被匹配到的第一个模式，以此類推

with grep

grep -o
Print only the matched (non-empty) parts of a matching line
$ echo 'vince 123' | grep -o '^[^ ]*'
vince$ echo '123,vince,5566a456|123' | grep -o '[0-9]*'
123
5566
456
123$ echo '123 cl/5566,789' | grep -o 'cl/[0-9]*' | sed 's/cl\///'
5566

with awk

example.txt
vince 1234
david 5566
frank 7788$ cat example.txt | awk '{print $1}'
$ awk '{print $1}' example.txt
vince
david
frank$ awk '$1 ~ /^v.*/ {print $0}' example.txt
vince 1234$ awk '$1 !~ /^v.*/ {print $0}' example.txt
david 5566
frank 7788

Regular Expression

Match/Replace/Extract with grep, perl, sed,awk, python

General Rule

Text content

Match one character . [] [^]

Anchors Position (Head/End) ^$

Count []{} * +

Numbers or letters

Match text

with grep

with perl

with vim

with python

Replace

with sed

with vim

Group & Extract Substring

with python

with sed

with grep

with awk

Written by Vince