Advanced Pandas: 21 Powerful Tips for Efficient Data Manipulation

Maximize Your Pandas Skills: Essential Tips and Tricks for Mastering Data Manipulation

4 min readJul 6, 2024

Pandas, a powerful open-source data manipulation library for Python, is an indispensable tool in any data scientist’s toolkit. Known for its fast performance and intuitive syntax, Pandas excels at handling structured data, making it crucial for tasks ranging from data cleaning to complex analysis. In this article, we’ll explore 21 essential Pandas tips that will boost your productivity, streamline your code, and unlock new capabilities in your data workflows.

1: Print DataFrame in Markdown-friendly format

|    |   a |   b |
|---:|----:|----:|
|  0 |   1 |   5 |
|  1 |   2 |   6 |
|  2 |   3 |   7 |
|  3 |   4 |   8 |

+----+-----+-----+
|    |   a |   b |
+====+=====+=====+
|  0 |   1 |   5 |
+----+-----+-----+
|  1 |   2 |   6 |
+----+-----+-----+
|  2 |   3 |   7 |
+----+-----+-----+
|  3 |   4 |   8 |
+----+-----+-----+

2: Group rows into a list

col1    col3  col2              
a      1.5  [d, e]
b      3.5  [f, g]
c      3.0     [h]

3: DataFrame.explode()

        a   b
0  [1, 2]  11
1  [4, 5]  13

4: DataFrame.copy()

    col1  col2
0     7     4
1     8     5
2     9     6

    col1  col2
0     1     4
1     2     5
2     3     6

5: Groupby().count vs Groupby( ).size

      col1  col2
a     S       1
b     M       1
      S       1
c     L       2
d     L       1
dtype: int64

6: Correlation

a    0.94388
b    0.68313
dtype: float64

7: Cross-Tabulation

person2  Ben  Jone  Patrick  Smith  Warren
person1                                   
Ben        0     0        1      1       0
Jone       0     0        0      0       1
Patrick    1     0        0      1       0
Smith      1     0        1      0       1
Warren     0     1        0      1       0

8: DataFrame.query()

    fruit  price
2  grape      6
3  grape      7

    fruit  price
2  grape      6
3  grape      7

9: Unpivot DataFrame

    fruit  Aldi  Walmart  Costco
0   apple     4        6       1
1  orange     5        7       2

    fruit    store  value
0   apple     Aldi      4
1  orange     Aldi      5
2   apple  Walmart      6
3  orange  Walmart      7
4   apple   Costco      1
5  orange   Costco      2

10: Rename aggregated column

size   price        
L     19.000
M     10.000
S     36.995

size  mean_price          
L         19.000
M         10.000
S         36.995

11: Normalized Value Counts

S     4
M     2
L     1
XL    1
dtype: int64

S     0.500
M     0.250
L     0.125
XL    0.125
dtype: float64

12: df.transform() instead of df.count()

  type  value
0    A      5
1    A      3
2    O      2
4    O      7
5    A      3

13: Fill in Null Values

    orange  apple
0    31.0    4.0
1     5.0   71.0
2     9.0   12.0

14: Value Counts Missing Values

15: Filter Columns in DataFrame

    Temp  Degree
0   Hot      35
1  Cold       3
2  Warm      15
3  Cold       2

    Degree  Temp_Cold  Temp_Hot  Temp_Warm
0      35          0         1          0
1       3          1         0          0
2      15          0         0          1
3       2          1         0          0

    Temp_Cold  Temp_Hot  Temp_Warm
0          0         1          0
1          1         0          0
2          0         0          1
3          1         0          0

16: Convert Data Types Automatically

17: Assign new columns to a DataFrame

                                        text  text_len  word_count  weekend
0        saturday: weekend (not working day)        35           5     True
1          sunday: weekend (not working day)        33           5     True
2      monday: doctor appointment at 2:45pm.        37           5    False
3  tuesday: dentist appointment at 11:30 am.        41           6    False
4       wednesday: basketball game at 7:00pm        36           5    False
5           thursday: back home by 11:15 pm.        32           6    False
6        friday: take the train at 08:10 am.        35           7    False

18: Read HTML Tables

    Year     Office    GOP    DFL Others
0  2020  President  45.3%  52.4%   2.3%
1  2020    Senator  43.5%  48.8%   7.7%
2  2018   Governor  42.4%  53.9%   3.7%
3  2018    Senator  36.2%  60.3%   3.4%
4  2018    Senator  42.4%  53.0%   4.6%

      Location July (°F) July (°C) January (°F) January (°C)
0  Minneapolis     83/64     28/18         23/7       −4/−13
1   Saint Paul     83/63     28/17         23/6       −5/−14
2    Rochester     82/63     28/17         23/3       −5/−16
3       Duluth     76/55     24/13         19/1       −7/−17
4    St. Cloud     81/58     27/14        18/−1       −7/−18

19: ‘nlargest’ and ‘nsmallest ‘

Data Link: IMDB Rating

20: Create a Rank Column

  Students  Marks
0     John     80
1    Smith     56
2  Patrick     95
3      Bob     75
4     Jose     45

  Students  Marks  Rank
0     John     80   2.0
1    Smith     56   4.0
2  Patrick     95   1.0
3      Bob     75   3.0
4     Jose     45   5.0

21: Color Values in DataFrame