Regular Expressions for Testers — Regex4T(using javascript)

Richard Forjoe
9 min readNov 6, 2021

--

REGEX — image by me

Regex — some people love it, while others hate it. However, regardless of your feelings towards it, regex is an incredibly useful tool for matching, searching, and replacing text. In this post, I’ll provide a working example of how to use regex for these purposes.

During one of his talks, Kevin Skoglund shared an interesting nugget about regex: originally, the command “grep” stood for “global regular expression print,” with the “g/re/p” acronym representing the ordering of the global mode in the front. However, over time, the global mode was moved to the end, resulting in the modern syntax of “/re/g”.

Mind map of regex parameters with examples
Image 001: Regex mind map with its parameters and examples — Link

Regex can be useful in various scenarios, such as:

  1. Extracting specific text values from a larger string. For example, you can use regex to retrieve all email addresses from a document.
  2. Validating whether a string of text matches a particular pattern. For instance, you can use regex to check if a phone number is in a specific format.
  3. Converting a string of text into a different format. Regex can help you achieve this by searching for specific patterns and replacing them with a desired format.

When working with regex, it’s helpful to have access to tools that can assist you in building and testing your expressions. Some popular options include:

  1. regexr.com: This website provides a user-friendly interface for building and testing regex patterns. It includes real-time feedback and helpful resources to guide you in creating effective expressions.
  2. regex101.com: This tool offers a more advanced interface for building and testing regex patterns. It includes features like syntax highlighting, explanation of the expression structure, and the ability to create a custom test string.

Both of these resources can be invaluable in helping you create and fine-tune your regex expressions.

NEW: https://chat.openai.com/ — Now makes generating regex much easier but feel it is still important to understand regex, to be able to review and challenge regex generated by chat gpt.

The Fundamentals:

  1. Expression flags/Modes

When working with regex, it’s essential to understand expression flags or modes, which are modifiers that affect how the expression works. Here are some of the fundamental flags/modes to be aware of

A. Global (g): This mode searches for all occurrences of the pattern in the input string instead of stopping after the first match.

B. Case-insensitive (i): This mode makes the pattern match case-insensitive, allowing it to match text regardless of whether it is uppercase or lowercase.

C. Multi-line (m): This mode allows the ^ and $ characters to match the start and end of each line in a multi-line input string, instead of just the start and end of the entire string.

D. Unicode (u): This mode enables support for Unicode characters in the pattern and input string.

E. Sticky (y): This mode searches for matches starting at the lastIndex position of the previous match. It is useful for finding multiple matches in a string with a shared prefix.

By understanding and utilizing these flags/modes, you can make your regex expressions more versatile and powerful.

2. Matching literal characters:

Matching literal characters is a basic concept in regex. For example, the expression /abc/ matches the text string “abc”. It’s important to note that regex is case-sensitive by default, so “abc” and “abC” would not be considered matches unless you use the case-insensitive flag, denoted as /abc/i.

It’s also worth noting that spaces count as characters in regex, so a string like “a b c” would not match the literal character expression /abc/. However, you can use additional regex syntax to account for spaces, such as the whitespace character class (\s) or specific whitespace characters themselves.

Working example

To demonstrate the practical use of regex, let’s consider an example of extracting query parameters from a URL using regex.

When testing Adobe Analytics reporting data, the data is often sent as parameters in the URL. By using regex, we can extract these parameters and use them to make assertions about the data. For instance, we can check whether the expected parameters are present in the URL and verify their values.

Here’s an example URL with query parameters:

Task: Extracting all the parameters from a given URL and creating a function to assert against each one.

First, we need to clean up the URL since it contains encoded characters. URLs can only be sent over the internet using the ASCII character set, which means that unsafe ASCII characters are replaced with a “%” followed by two hexadecimal digits. For example, a space becomes “%20”. We can use JavaScript methods, such as replace() or replaceAll(), to replace specific values in the URL.

One option for cleaning up the URL is to use the replaceAll() method. Here’s an example of how to use this method to replace all the converted ASCII occurrences with their values:

Option A: Using .replaceAll() to clean up values

Option B: Using decodeURIComponent() to decode the url values. This avoids having to do the above.

Secondly, Once the URL has been cleaned up, the next step is to identify patterns in the parameter/string that we want to match. For example, query parameters tend to follow the pattern “?parameter=value” => |**=**|.

In this case, all parameters have an equal sign between them, such as “ATV=1”, “nra=1”, etc. Therefore, we can use the equal sign as a delimiter to extract the parameter name and its corresponding value.

To do this, we can use regex to match the values that need to be extracted. Since the parameters follow a consistent pattern, we can use a simple regex pattern to extract the parameter name and value.

For example, we can start by matching any word or number characters that appear before the equal sign, followed by any word or number characters that appear after the equal sign:

Left pattern: and on the left of the equal sign are words and numbers so

Continuing from the previous step, we need to match the parameter value on the right side of the equal sign. However, this can be more complicated than matching the parameter name since it can include different characters, spaces, words, and numbers.

To match the parameter value, we can start by using the equal sign as a delimiter, just like in the previous step:

Step 1. Using = returns first instance of equal in the string: /=/

Step 2. adding the g(global) flag returns all instances of equal: /=/g

Step 3. for all the instances that match, there are characters or numbers on the left of the equal, so using \w covers both characters and numbers. For only 1 character or number use: /w=/g

Step 4.since you need full words not just single characters, a repetition meta character[MC007] can be used. These are +, *, ? : e.g. /w+=/g

  • \w*=\g: returns a match when theres only = aswell as any letter or number with = after it
  • \w+=\g: returns a match only when = has any letter or number after it
  • \w?=\g: same as * but matches only 1 character or number

This regex pattern matches any word or number characters that appear before the equal sign, followed by the equal sign, and any non-space characters that appear after the equal sign, captured in the first group.

Right pattern: The right of the equal sign is a bit more complicated, there are different characters, spaces, words and numbers

since the parameter value can include various characters, we need to use a combination of character sets [MC004], and ranges [MC005] to match all possibilities. We can use the following regex pattern to match all letters and numbers: /=[A-Za-z0–9]

Adding a repetition meta character [MC007] “+” to a character set, such as “[A-Za-z0–9]+”, allows us to match all instances of one or more letters or numbers in the parameter value.

the pattern doesn’t cover special characters.

If the parameter value can include additional special characters beyond letters, numbers, and percent signs, we need to adjust our regex pattern accordingly.

For example, if the parameter value includes vertical bars (“|”) or periods (“.”), we can add these characters to the character set in the pattern. Here’s an updated pattern that includes vertical bars and periods

By adjusting the character set in the pattern, we can match the desired set of characters in the parameter value and ensure that our regex expression accurately captures the information we need. In the example provided, the updated pattern below would match the value v25=Chrome|320000|QS111.111.11 contains | and .

Next: Apply the pattern using match() to extract the values into an array

Then: Stripping out unnecessary data. This step is not needed in this case but is for demonstration purposes only. You can use regex to strip out the url from the data. Since no assertions are being done against the host part of the url.

You can start by identifying the pattern/string that can be used to extract the url. Start by applying the fundamental word characters /w, then use the repetition meta character[MC007]+. Since there are several characters in the url, you can use the character sets[MC004] & Ranges[MC005] and populate it with characters in the url.

Summary:

When using regex,

  • identify the pattern needed: **=**
  • use literal characters[LC001] + metacharacters [MC00?] + flag modes to create your regex
  • test, test, test that the pattern covers all scenarios
  • best way to get used to regex is to extract statements into https://regexr.com/ and pick certain patterns and try to create regex for them.
  • NB: space is also a character and when using a meta character you don’t want applied as a meta character . Use the escape character eg . is a wildcard not full stop -> \. full stop.
  • It’s important to note that the wildcard metacharacter may be a source of confusion. The dot metacharacter matches any single character only (letter, digit, whitespace, or everything). For example, using the pattern “c.” will match any two characters that start with “c”, such as “cA”, “c0”, “c!”, “c “, etc.

By using regex effectively, you can extract, match, and manipulate text strings in a powerful and flexible way, making it a valuable tool for data processing, text mining, and web development.

Sample code snippet I’ve used in cypress to test values in the url data.

--

--

Richard Forjoe

I am a passion fuelled tester, who wants to see the profession flourish! Hobby: Street photography & Portraits — Insta: forjoe_streets & forjoe_ports