Regex For Dummies. Part 2: Flavors, Flags, and Assertions

NALSengineering
6 min readSep 27, 2023

Credit: Nguyễn Thành Minh

We learned about the basic metacharacters in regex in the previous part.

In this part, we will continue to find out the following concepts in Regex:

  • Flavors
  • Flags
  • Input boundary assertion (^ and $)

1. Regex Flavors

In regex101.com, you can see ‘Regex Flavors’ on the left side.

Regex Flavors

Regex flavors refer to the specific implementations or variations of regular expressions found in different programming languages or libraries. Each programming language or tool that supports regular expressions may have its own unique flavor, characterized by distinct features, syntax, and behavior. Put simply, a regular expression written in PHP may differ from one in Dart or JavaScript. In other words, a regex that works correctly for your problem in one programming language may not work correctly in another.

Understanding the regex flavor you are working with is important because it dictates how you construct and use regular expressions in that particular environment. In this series, I will use the ECMAScript flavor.

2. Regex flags

Regex flags

Regex flags are optional parameters that you can add to the regex pattern to modify how the pattern matching is performed. These flags control various aspects of matching, such as case sensitivity, multiline mode, and more. Flags are typically represented as single-letter characters such as i, g, m, s,…

Here are some common flags used in regex:

  • i (case-insensitive): When the “i” flag is used, the regex pattern becomes case-insensitive, meaning it will match characters regardless of their case. For example, with the ‘i’ flag, regex abc would match “abc”, “AbC”, “aBc”, and so on, and so on, but without the ‘i’ flag, it would match only “abc”.
  • g (global): The “g” flag allows the regex to find all matches within a given string, rather than stopping after the first match. Take the problem in the images below as an example, when using the ‘g’ flag, the regex matches with all three of the word ‘hello’. However, without the ‘g’ flag, it matches only the first ‘hello’.
With the ‘g’ flag
Without the ‘g’ flag
  • m (multiline): The “m” flag changes the behavior of the ^ and $ anchors. Without the “m” flag, they match the start and end of the entire string. With the “m” flag, they match the start and end of each line within the string. We will learn more about ^ and $, as well as this flag later.
  • s (dotall or single line): The “s” flag allows the dot . metacharacter to match any character, including newline characters. Without this flag, the dot matches any character except newlines.

3. Input boundary assertion: ^, $

An input boundary assertion checks if the current position in the string is an input boundary. An input boundary is the start or end of the string; or, if the m flag is set, the start or end of a line.

^ asserts that the current position is the start of input. $ asserts that the current position is the end of input. Both are assertions, so they don't consume any characters.

Example: ^M.*h$ matches strings that begin with ‘M’ and end with ‘h’. When the ‘m’ flag is set, ^ and $ will assert the start and end of each line. Therefore, with two lines, we will have two matches.

With the ‘m’ flag

When the ‘m’ flag is not set, ^ and $ will assert at the start and end of the string. This means we will only have one matching result, which is the entire string ‘Minh\nMih’. However, in this case, the result is ‘no match.’ That’s weird!

Without the ‘m’ flag

Not weird! Because the dot character . does not match the newline character \n but there is a newline character in the string ‘Minh\nMih’, so it cannot match. To fix this, we need to add the ‘s’ (single line) flag to allow the dot character . to match the newline character \n as well.

With the ‘s’ flag

4. Practice

To master Regex, we shouldn’t just rely on examples that use individual metacharacters. We need to practice applying the metacharacters we’ve learned to solve problems. There are some exercises using the metacharacters we’ve just learned in two articles.

4.1. Hashtag Validation

Examples of valid matches:

  • #vacation
  • #relaxation

Examples of invalid matches:

  • #
  • vacation

=> Regex: ^#.+$ (LINK). In which:

  • ^#: This part of the pattern matches a literal hash symbol '#'. It indicates that the pattern should start with the character '#'.
  • .+: This part of the pattern matches one or more characters. It is used to match one or more characters after the '#' symbol.
  • $: This part of the pattern asserts the end of the line.

So, the entire regex ^#.+$ is used to match a string that starts with a '#' symbol and is followed by one or more characters.

4.2. Email Validation

Examples of valid matches:

  • user@example.com
  • john_doe@gmail.org
  • info@website.info

Examples of invalid matches:

  • user
  • user@
  • user@example
  • website.info
  • @website.info

=> Regex: ^.+@.+\..+$ (LINK). In which:

  • .+: This part of the pattern matches one or more characters. It is used to match one or more characters before the '@' symbol.
  • @: This part of the pattern matches a literal '@' symbol.
  • .+: Similar to the first .+, this part matches one or more characters after the '@' symbol.
  • \.: This part matches a literal dot '.' character.
  • .+: Finally, this last part matches one or more characters after the dot.

In summary, the entire regex ^.+@.+\..+$ is used to match strings that resemble email addresses. This regex pattern is a simple way to identify potential email address formats in text.

4.3. URL Validation

Examples of valid matches:

Examples of invalid matches

  • example.com
  • example
  • https:example.com
  • https:/example.com
  • https//example.com

=> Regex: ^https?:\/\/.+\..+$ (LINK). In which:

  • http: This part of the pattern matches the word "http".
  • s?: The s character is optional due to the ? quantifier. This means the pattern will match both "http" and "https".
  • :\/\/: These literal characters match the colon and two forward slashes that are part of the URL scheme “://”.
  • .+: This part of the pattern matches one or more characters. It is used to match any sequence of characters after the "//" in the URL.
  • \.: This part matches a literal dot '.' character.
  • .+: Finally, this last part matches one or more characters after the dot.

In summary, the regex https?:\/\/.+\..+ is used to match strings that resemble URLs with both "http" and "https" schemes. It's a basic way to identify URLs in text.

Conclusion

Actually, the regexes used to solve the above exercises are much more complex. In the next part, we will solve these exercises again using more advanced methods.

Continue to Part 3:

--

--

NALSengineering

Knowledge sharing empowers digital transformation and self-development in the daily practices of software development at NAL Solutions.