Member-only story
Let’s Revisit Case-When in Different Libraries Including the New Player: Pandas
How to create conditional columns with different tools.
Whether you’re working on data analysis, data cleaning, or even feature engineering, it’s a frequently done operation to create a new column based on the values in other columns.
All the tools I’ve used for data cleaning and manipulation have functions for this task (e.g. SQL, R data table, PySpark). We now have a new player in the game: Pandas.
By the way, it was possible to create conditional columns with Pandas but it did not have a dedicated case-when function.
With Pandas 2.2.0, the case_when
function has been introduced to create a Series object based on one or more conditions.
Let’s revisit how this super helpful operation is done with the commonly used data analysis and manipulation tools.
To keep it consistent and easier to spot differences among tools, we’ll use a small dataset.
SQL
The following is a small SQL table called “mytable”.
+-------------+----------+---------+
| a | b | c |
+-------------+----------+---------+
| 0 | 5 | 1 |
| 1 | -1 |…