Pandas Query: the easiest way to filter data
Learn 9 code snippets that will enhance your productivity.
Pandas Query has been around for quite some time now, however it was earlier this year when I started to hear more about that function.
Put simply, df.query()
will take a boolean expression (True/False output) and will match it with each row of your dataset, returning only those that are True.
Pandas Query evaluates an expression and returns the dataframe filtered with the True values.
The Syntax
It is a simple syntax, what helps a lot with code readability and also breaks barriers for beginners. Remember to always write the expression to be evaluated within quotes ‘expression’
.
df.query('expression')
Before jumping to the code snippets, let’s create a dataframe.
# Dataframe
df = pd.DataFrame( np.random.randint(1,10, size=(5,5) ), columns=list('ABCDE') )# Insert a column "Name"
df.insert(0,'Name',['Jack','John', 'Paul', 'Jack', 'Paul'])df