Capturing the contents of an HTML tag using PHP and a regular expression

1 min readDec 19, 2023

Capturing the contents of an HTML tag using PHP and a regular expression

Using regular expressions for parsing HTML might not be the first thing that comes to mind, but regex can be a concise and effective way to get at the content in which you are interested.

Here is a PHP example:

the php statement: $pattern = “/<title>(.*?)<\/title>/si”;

$pattern: This variable is being assigned a regular expression pattern.

“/<title>(.*?)<\/title>/si”: This is the regular expression itself, enclosed in double quotes.

/<title>: This part of the pattern is looking for the literal string <title>.

(.*?): This is a non-greedy capture group that matches any character (.) zero or more times (*), but as few times as possible due to the non-greedy qualifier ?. This is used to capture the content between the <title> and </title> tags.

<\/title>: This part of the pattern is looking for the literal string </title>.

/si: These are modifiers for the regular expression:

s — makes the dot (.) in the pattern match all characters, including newlines.

i — makes the pattern case-insensitive.

So, in summary, this regular expression is designed to capture the content between <title> and </title> tags in an HTML document, and it’s case-insensitive while also matching across multiple lines.

Written by Sonny Smith