Searching a full DataFrame for a Regular Expression

Originally published at http://www.ds4n6.io.

If you come from a UNIX/Linux background, I’m sure one of your favorite commands is grep. It is so easy to use and so powerful!

grep allows you do regular expression searches in text data (typically files).

The good news is that pandas includes a function (str.contains()) that allows you to do regex directly (the python equivalent is “re”, but you will need to import it).

The bad news though is that the pandas world does not have something as easy and intuitive as grep. In fact, doing a search for a regular expression in a Series (or DataFrame column) is quite ugly:

df[df['column'].str.contains('myregex')]

And it gets even uglier if you want to search in every column of your dataframe:

df[df.apply(lambda row: row.astype(str).str.contains(regex).any(),axis=1)

There are a couple of options for str.contains (case, regex, etc.) that you can find useful to modify, so take a look at the official str.contains function page

In order to make things easier, we will be introducing a helper function in the next version of the ds4n6.py library: search_regex_df.

It is pretty simple, so you can define it in your notebook and start using it right now.

The usage is simple: search_regex_df(mydf,”myregex”[,reverse=True])

As you can see, if you want to reverse the results (i.e. get the lines that do not match the regex, the equivalent of our beloved “grep -v”), you can set the optional parameter “reverse” to True (reverse=True).

The best part is that, since the result is also a DataFrame, you can continue to “pipe” actions/functions/filters on the results!

search_regex_df:

def search_regex_df(df,regex,reverse=False):
if reverse == False:
results=df[df.apply(lambda row: row.astype(str).str.contains(regex).any(),axis=1)]
else:
results=df[~df.apply(lambda row: row.astype(str).str.contains(regex).any(),axis=1)]

return results

--

--

--

Data Science Forensics Tips and Tricks

Recommended from Medium

Robux Generator

Selecting your next JavaScript framework

Mock Interview Test.

Security in React Native applications — AsyncStorage

A person holding a mobile phone

Different ways of writing a function in JavaScript — Part I

Using React Hooks in Ionic React

PDF Download#% Monthly Planner 2021–2022: Botanical Cactus Themed, 2 Year 24 Months Simple Planners…

Pagination | Rails — React — Redux

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
DS4N6

DS4N6

Community focused on bringing Data Science & Artificial Intelligence to the fingertips of the average Forensicator and promoting advances in the field.

More from Medium

Countries’ credit rating dataset, pt. 1: wrangling & cleaning

NUMPY , PANDAS AND MATPLOTLIB :-

From Zero to Hero with Pandas Dataframe

Performing Null analysis with the help of pybaseanal