Regular Expressions: Dot-Star and the Caret/Dollar Characters

A series of tutorials on Regular Expressions using Python

Zohaib Shahzad
4 min readJul 2, 2020

If you’ve stumbled across this article and are new to this series of tutorials on regular expressions, feel free to take a look at the rest of the series (in order):

  1. Regular Expressions: Basics
  2. Regular Expressions: Grouping & the Pipe Character
  3. Regular Expressions: Repetition & Greedy/Non-Greedy Matching
  4. Regular Expressions: Character Classes & findall() Method
  5. Regular Expressions: Dot-Star and the Caret/Dollar Characters
  6. Regular Expressions: sub() Method and Verbose Mode

The Caret Symbol (^)

You can use the caret symbol (^) at the start of a regular expression to indicate that a match must occur at the beginning of the searched text.

If we apply the following regular expression ^a (if a is the starting symbol) to input string abc, it matches a. But if we apply regular expression ^b on that same input string it does not match anything. This is because within the input string abc, b is not the starting symbol. Let’s take a look at another regular expression ^(T|t)he which means: uppercase character T or lowercase character t is the start symbol of the input string, followed by lowercase character h, followed by lowercase character e.

^(T|t)he => The car is parked in the garage

Test the regular expression

Example 1: Implementing Caret Symbol (^)

In the regex above, ^Hello matches strings that begins with “Hello”.

The Dollar Sign ($)

Likewise, you can put a dollar sign ($) at the end of the regex to indicate the string must end with that regex pattern.

Example 2: Implementing Dollar Symbol ($)

Example 3: Implementing Both Caret (^) & Dollar Symbol ($)

Moreover, you can use the ^ and $ together to indicate that the entire string must match the regex — that is, it’s not enough for a match to be made on some subset of the string.

The Wildcard Character

The period . (or dot) character in regex is called a wildcard because it matches any character except for a newline.

Example 4: Implementing Wildcard

atRegex = re.compile(r’.at’) # returns anything with ‘at’ followed afteratRegex.findall(‘The cat in the hat sat on the flat mat’)# return: [‘cat’, ‘hat’, ‘sat’, ‘lat’, ‘mat’]

NOTE:

the dot character will match just one character, which is why the match for the text “flat” in this example matched only “lat”. To match an actual dot, escape the dot with a backslash: \

Example 5: Wildcard with Repetition

atRegex = re.compile(r’.{1,2}at’)

With regard to {1, 2}, it means any 1 or 2 characters (even white spaces), followed by “at”.

Matching Everything with Dot-Star

Example 6: Implementing Dot-Star

The dot-star (.*) uses greedy mode. It will always try to match as much text as possible. To match any and all text in a non-greedy fashion, use the dot, star, and question mark (.*?). Similarly with braces, the question mark tells Python to match in a non-greedy way.

Example 7: Implementing a Non-Greedy Search (.*?)

serve = ‘<To serve humans> for dinner.>’nongreedy = re.compile(r’<(.*?)>’)nongreedy.findall(serve) # [‘To serve humans’]

Both regular expression’s roughly translate to “Match an opening angle bracket, followed by anything, followed by a closing angle bracket.” But the string “<To serve man> for dinner.>” has two possible matches for the closing angle bracket. In the non-greedy version of the regex, Python matches the shortest possible string: “<To serve man>”. In the greedy version, Python matches the longest possible string: “<To serve man> for dinner.>”.

Matching Newlines with the Dot Character

Recall that the dot-star (.*) will match everything except a newline (\n). However, you can pass a second argument called re.DOTALL to re.compile() and by doing that you can make the dot character match ALL characters, including the newline character (\n).

Example 8: Dot-Star with re.DOTALL

prime = ‘Serve the public trust.\nProtect the innocent.\nUpload the law.’noNewlineRegex = re.compile(r’.*’)noNewlineRegex.search(prime)

Just to reiterate, the above example does not implement the re.DOTALL as a second argument. Therefore, it only matches up to “Serve the public trust.”

newlineRegex = re.compile(r’.*’, re.DOTALL)newlineRegex.search(‘Serve the public trust.\nProtect the innocent.\nUphold the law.’).group()

The regex noNewlineRegex, which did not have re.DOTALL passed to the re.compile() call that created it, will match everything only up to the first newline character, whereas newlineRegex, which did have re.DOTALL passed to re.compile(), matches everything. This is why the newlineRegex.search() call matches the full string, including its newline characters.

Another Second Argument: re.IGNORECASE

Another second argument you can implement within the re.compile() method is re.IGNORECASE.

re.IGNORECASE simply makes it so that the interpreter does not care about letter case when matching strings according to Regex.

Example 9: Using re.IGNORECASE for case insensitive matching

vowelRegex = re.compile(r’[aeiou]’)vowelRegex.findall(‘Al, why do you talk about Regex so much?’)# return: [‘o’, ‘e’, ‘o’, ‘u’, ‘o’, ‘a’, ‘i’, ‘o’, ‘o’, ‘a’, ‘a’, ‘o’, ‘u’, ‘o’, ‘o’, ‘o’, ‘o’, ‘u’]

Notice, how the regex matched only the lowercase vowels and ignored the uppercase A from “Al”.

Now let’s implement the re.IGNORECASE argument.

vowelRegex = re.compile(r’[aeiou]’, re.IGNORECASE)# return: [‘A’, ‘o’, ‘e’, ‘o’, ‘u’, ‘o’, ‘a’, ‘i’, ‘o’, ‘o’, ‘a’, ‘a’, ‘o’, ‘u’, ‘o’, ‘o’, ‘o’, ‘o’, ‘u’]

Now that we’ve implemented the re.IGNORECASE argument, it takes into account all vowels regardless of letter case. You can also use re.I as an alternative to re.IGNORECASE.

--

--