Pandas tricks I wish I knew as a beginner
Python is an easy to learn language mainly because of its code readability and syntactic simplicity. Though it takes time and practice to master a skill no matter how easy it is going to be. One of those skills is the pandas python package. Almost everyone in data science stream, be it a beginner or a professional gets stuck somewhere in the data manipulation step, no wonder maximum time of a data scientist in spent in data cleaning and formatting.
In this article we are going to have a look on some really helpful and efficient pandas tricks that will save you big time! These suggestions are based on my personal experience and pandas documentation link is provided. Let’s get with it.
1. The dot notation
The most common way to slice a column in pandas is using:
sliced_column = df['column1'] #Returns a pandas series
The requirement of using square brackets and quotes make this simple task a little tedious, thankfully there is an alternative way to do this task and it is much simpler:
sliced_column = df.column1 #Returns the same pandas series as above
A simple change from [‘col’] to .col saves time as well as makes the code cleaner. Although if the column name has spaces or any other special character other than underscore(_) this trick cannot be used, but you can always rename your columns:)
2. Replacing made easy
There can be many instances where you might need to replace a value in the dataframe by another value based on certain condition for example, replace the value by 0 if it is below a certain threshold or by 1 if it is above the threshold.
The traditional way to do it:
A more simplistic approach:
np.where() returns a numpy array, the notation np.where(cond, a ,b) in itself is quite semantic and tells a reader that replace with a when the cond is True else replace with b. The catch here is that the else part cannot be ignored that means existing values cannot be preserved even though the condition is False.
3. applymap() or apply()
Let’s first talk about applymap(), the applymap() function performs element-wise transformation and is not designed to used with complex functions. By element-wise transformation we mean applymap() will apply the basic transformation to each and every element in the dataframe.
using applymap() will reduce the amount of code(as compared to apply()) when entire dataframe is to be transformed.
Now let’s look into apply(), the apply() function can perform more complex operations on a dataframe by passing the data row-by-row or column-by-column to a function which then processes the data and returns the result.
A more detailed comparison including map():
4. Combining dataframes
Pandas provides 4 different methods to combine dataframes, namely:
And of these how many are actually needed? Just 2 of them concat and merge! As a beginner I was always confused of what technique to use when merging two dataframes, here’s the solution.
Use concat when another dataframe is to be appended along row or column axis, so we can say concat() substitutes append() and for joining two dataframes on a key use merge() instead of join(). Merge acts as an sql join and therefore can be inner, outer, left and right.
Here’s concat:
Merge:
how: specifies type of join to perform inner | outer | left | right
on: specifies the key for merge
Merge has many more useful arguments such as left_on and right_on in case if key names in both dataframes are different or suffixes which renames common column names so as to differenciate between the source of column.
5. Generating date ranges
Pandas date_range() takes simplification to another level, let’s first have a look to generate a list of dates between start and end dates using datetime package:
Now that’s a lot of work just to get a range of dates, let’s see pandas take on this problem:
Clearly pandas wins here, I find this technique useful when performing data visualizations where I need dates on X axis.
Before you go:
So that were some of the most helpful tools in pandas that I wish I knew as a beginner and saved me quite some time. I strongly encourage you to go through the documentation of each function so as to have complete knowledge of it.Thank you for coming by and reading till the end. Have a nice day:)