What is Regular Expression (4)

Remember ^...$?
Remember ‘^…$’?

\b vs. \B

There is one more special pattern you need to remember, \b and \B. Again, \B is the negated version of \b. \b can find a word boundary, such as whitespace, punctuation and the start/end of a string. There are three different kinds of word boundaries. By word boundaries, I mean a-z, A-Z and 0–9, which is \w in regex.

Before a character:

  • If the first character is a word character, the boundary is the one before the first character.
  • When you put something like \bword\b, it can help you find the whole words. For example: \bword\b can find ‘a word’.

Between two characters:

  • When one is a word character and the other is not.
  • For example: \b1\b can find ‘Regex-1

After a character:

  • When the last character is a word character.
  • For example: Welcome\b can find ‘Welcome!’

You may wonder what is the different between \b and \s. It seems that both can find whitespace, why we need two? Well, \b is like ‘^’ and ‘$’, which means it will match a position. The match is zero-length. It can only work with other characters together. It will be helpful when you want to find the word characters with boundaries. Meanwhile \w can be very helpful when you want to find the actual whitespace.

[…] & [^…] & [a-z]

\b might be a little bit tricky. Let’s do something simple, then. All the patterns above are for general use. Are there any patterns for specific character? When you want to search for a song with vowel as a start, \w or dot can make things even harder. To match one out of several characters, you can placed them inside a square brackets []. For example, [aeiou] is a vowel can find ‘a is a vowel’, ‘e’ is a vowel’ and etc. Just remember, [] will find only one character out of all the characters within it, not all of them.

How about you want to find music name that not starts from a vowel. Well, you can add a ‘^’ within a square brackets like [^aeiou]. Within a square brackets, ‘^’ does not mean ‘start’ anymore, it means ‘opposite’ or ‘negated’. [^aeiou] can help find all the letter other than ‘aeios’.

[] is very useful, but listing all the word or number one by one may also be exhausted. You can use dash(-) to accelerate this process. [a-z] means a, b, c…z. [7–9] means 7, 8, 9. [A-D] means A, B, C, D. Easy? One thing to keep in mind, Regex is case sensitive, which means a is different from A. In the example above, if you want to search for vowel and do not care whether it is capital letter or not, you can use [aeiouAEIOU].

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.