Advanced Pandas: 21 Powerful Tips for Efficient Data Manipulation
Maximize Your Pandas Skills: Essential Tips and Tricks for Mastering Data Manipulation
4 min readJul 6, 2024
Pandas, a powerful open-source data manipulation library for Python, is an indispensable tool in any data scientist’s toolkit. Known for its fast performance and intuitive syntax, Pandas excels at handling structured data, making it crucial for tasks ranging from data cleaning to complex analysis. In this article, we’ll explore 21 essential Pandas tips that will boost your productivity, streamline your code, and unlock new capabilities in your data workflows.
1: Print DataFrame in Markdown-friendly format
| | a | b |
|---:|----:|----:|
| 0 | 1 | 5 |
| 1 | 2 | 6 |
| 2 | 3 | 7 |
| 3 | 4 | 8 |
+----+-----+-----+
| | a | b |
+====+=====+=====+
| 0 | 1 | 5 |
+----+-----+-----+
| 1 | 2 | 6 |
+----+-----+-----+
| 2 | 3 | 7 |
+----+-----+-----+
| 3 | 4 | 8 |
+----+-----+-----+
2: Group rows into a list
col1 col3 col2
a 1.5 [d, e]
b 3.5 [f, g]
c 3.0 [h]
3: DataFrame.explode()
a b
0 [1, 2] 11
1 [4, 5] 13
a b
0 1 11
0 2 11
1 4 13
1 5 13
4: DataFrame.copy()
col1 col2
0 7 4
1 8 5
2 9 6
col1 col2
0 1 4
1 2 5
2 3 6
5: Groupby().count vs Groupby( ).size
col2 col1
a 1
b 2
c 2
d 1
col1 col2
a S 1
b M 1
S 1
c L 2
d L 1
dtype: int64
6: Correlation
a 0.94388
b 0.68313
dtype: float64
7: Cross-Tabulation
person2 Ben Jone Patrick Smith Warren
person1
Ben 0 0 1 1 0
Jone 0 0 0 0 1
Patrick 1 0 0 1 0
Smith 1 0 1 0 1
Warren 0 1 0 1 0
8: DataFrame.query()
fruit price
2 grape 6
3 grape 7
fruit price
2 grape 6
3 grape 7
9: Unpivot DataFrame
fruit Aldi Walmart Costco
0 apple 4 6 1
1 orange 5 7 2
fruit store value
0 apple Aldi 4
1 orange Aldi 5
2 apple Walmart 6
3 orange Walmart 7
4 apple Costco 1
5 orange Costco 2
10: Rename aggregated column
size price
L 19.000
M 10.000
S 36.995
size mean_price
L 19.000
M 10.000
S 36.995
11: Normalized Value Counts
S 4
M 2
L 1
XL 1
dtype: int64
S 0.500
M 0.250
L 0.125
XL 0.125
dtype: float64
12: df.transform() instead of df.count()
type value
0 A 5
1 A 3
2 O 2
4 O 7
5 A 3
13: Fill in Null Values
orange apple
0 31.0 4.0
1 5.0 71.0
2 9.0 12.0
14: Value Counts Missing Values
15: Filter Columns in DataFrame
Temp Degree
0 Hot 35
1 Cold 3
2 Warm 15
3 Cold 2
Degree Temp_Cold Temp_Hot Temp_Warm
0 35 0 1 0
1 3 1 0 0
2 15 0 0 1
3 2 1 0 0
Temp_Cold Temp_Hot Temp_Warm
0 0 1 0
1 1 0 0
2 0 0 1
3 1 0 0
16: Convert Data Types Automatically
17: Assign new columns to a DataFrame
text text_len word_count weekend
0 saturday: weekend (not working day) 35 5 True
1 sunday: weekend (not working day) 33 5 True
2 monday: doctor appointment at 2:45pm. 37 5 False
3 tuesday: dentist appointment at 11:30 am. 41 6 False
4 wednesday: basketball game at 7:00pm 36 5 False
5 thursday: back home by 11:15 pm. 32 6 False
6 friday: take the train at 08:10 am. 35 7 False
18: Read HTML Tables
Year Office GOP DFL Others
0 2020 President 45.3% 52.4% 2.3%
1 2020 Senator 43.5% 48.8% 7.7%
2 2018 Governor 42.4% 53.9% 3.7%
3 2018 Senator 36.2% 60.3% 3.4%
4 2018 Senator 42.4% 53.0% 4.6%
Location July (°F) July (°C) January (°F) January (°C)
0 Minneapolis 83/64 28/18 23/7 −4/−13
1 Saint Paul 83/63 28/17 23/6 −5/−14
2 Rochester 82/63 28/17 23/3 −5/−16
3 Duluth 76/55 24/13 19/1 −7/−17
4 St. Cloud 81/58 27/14 18/−1 −7/−18
19: ‘nlargest’ and ‘nsmallest ‘
Data Link: IMDB Rating
20: Create a Rank Column
Students Marks
0 John 80
1 Smith 56
2 Patrick 95
3 Bob 75
4 Jose 45
Students Marks Rank
0 John 80 2.0
1 Smith 56 4.0
2 Patrick 95 1.0
3 Bob 75 3.0
4 Jose 45 5.0