How to strip HTML tags from a string, in Python

Jorge Galvis
1 min readFeb 21, 2016

--

Earlier this week I needed to remove some HTML tags from a text, the target string was already saved with HTML tags in the database, and one of the requirement specifies that in some specific page we need to render it as a raw text.

I knew from the beginning that regular expressions could apply for this challenge, but since I am not an expert with regular expressions I looked for some advise in stack overflow and then I found what I actually needed.

Below is the function I have defined:

def remove_html_tags(text):
"""Remove html tags from a string"""
import re
clean = re.compile('<.*?>')
return re.sub(clean, '', text)

So the idea is to build a regular expression which can find all characters “< >” as a first incidence in a text, and after, using the sub function, we can replace all text between those symbols with an empty string.

Lets see this in the shell:

remove_tags

Hope this can help you!

--

--