Member-only story

Using Pandas pipe function to improve code readability

An intuitive tutorial for the best practice with Pandas pipe()

Robin Chan
Towards Data Science

--

Photo by June Wong on Unsplash

In Data Processing, it is often necessary to write a function to perform operations (such as statistical calculations, splitting, or substituting value) on a certain row or column to obtain new data.

Instead of writing

# f(), g(), and h() are user-defined function
# df is a Pandas DataFrame
f(g(h(df), arg1=a), arg2=b, arg3=c)

We can write

(df.pipe(h)
.pipe(g, arg1=a)
.pipe(f, arg2=b, arg3=c)
)

Pandas introduced pipe() starting from version 0.16.2. pipe() enables user-defined methods in method chains.

Method chaining is a programmatic style of invoking multiple method calls sequentially with each call performing an action on the same object and returning it.

It eliminates the cognitive burden of naming variables at each intermediate step. Fluent Interface, a method of creating object-oriented API relies on method cascading (aka…

--

--

Responses (5)