How to strip HTML tags from a string, in Python

1 min readFeb 21, 2016

Earlier this week I needed to remove some HTML tags from a text, the target string was already saved with HTML tags in the database, and one of the requirement specifies that in some specific page we need to render it as a raw text.

I knew from the beginning that regular expressions could apply for this challenge, but since I am not an expert with regular expressions I looked for some advise in stack overflow and then I found what I actually needed.

Below is the function I have defined:

def remove_html_tags(text):
    """Remove html tags from a string"""
    import re
    clean = re.compile('<.*?>')
    return re.sub(clean, '', text)

So the idea is to build a regular expression which can find all characters “< >” as a first incidence in a text, and after, using the sub function, we can replace all text between those symbols with an empty string.

Lets see this in the shell:

Hope this can help you!

How to strip HTML tags from a string, in Python

Written by Jorge Galvis