Data Science

40 Useful Pandas Snippets

Pandas snippets that come in handy in data analysis work

Photo by Pascal Müller on Unsplash

Pandas is a versatile and powerful library for data science. It’s like a swiss army knife for data science because it provides so many useful functions for different tasks for dealing with data.

To be effective with this tool, you need to know some tricks of the trade. In this article, I detail 40 useful pandas snippets that I use regularly.

For those with an understanding of the Pandas library, the following snippets might be useful.

For those who are unfamiliar with Pandas, the following might help you better understand the library by working through some examples.

The dataset used throughout this article is available on Kaggle.

Code for this article → Deepnote

Reading data

read_csv can do much more than just reading in your data.

Here’s a taste of it. (More in the docs)

1. Filter columns

Only need a couple of columns from the dataset? Use usecols

2. Parse dates on read

No need to do pd.to_datetime anymore, parse it on read!

3. Specify Data Types

Setting category data types at read can save a ton of memory for data frames!

4. Set index

Setting indexes are especially useful for time series data.

5. No. of rows to read

Don’t want to read in a dataset with millions of rows before having a peek at it? Use nrows!

6. Skip rows

Does your data set have rows with faulty data? Skip them!

7. Specify NA values

If your data has values that are supposed to be NA, i.e. values such as ? set it at read so you won’t have to convert it later.

8. Setting boolean values

Have a boolean column that’s in the form of Yes and No? Tell pandas about it!

9. Read from multiple files

Is your data in multiple files? Read them all in with glob!

10. Copy and Paste into Data Frames

Looking at some data on Excel but don’t want to download it? Copy it! Pandas can read from your clipboard.

11. Read tables from PDF files

Need to read in tables from PDf files, tabula-py has your back!

Exploratory Data Analysis (EDA)

12. EDA cheat

Want to visualize your dataset but don’t want to write code for plots? With pandas-profiling, you can do it with just one line of code.

Data Types (dtypes)

Here’s a list of dtypes for pandas

13. Filter columns by dtype

14. Infer dtype

Are your numeric columns read in as objects? Let pandas do the work in converting them!

15. Downcasting

Pandas’ to_numeric has a nifty feature to downcast the type, allowing you to reduce the data frame’s size.

16. Manual conversion

If there are NaN values in the data, errors="coerce" can help prevent those nasty errors. At the same time, you can fill those NA values with reasonable values using .fillna

17. Convert all at once

Column operations

18. Renaming columns

19. Add suffix and prefix

20. Create new columns (Mutate in dplyr terms)

21. Insert columns at specific positions

22. if-then-else

23. Dropping columns

String operations

24. Column names

24. Contains

25. findall

Missing values

26. Checking

27. Dealing with missing values

More in the docs

Date operations

28. Get X hours/days/weeks from today / ago

29. Filter between two dates

30. Filter by day/month/year

Styling data frames

31. Number format

32. Let there be colors

More styling options in the docs

Misc

33. Get the id of max and min in a column

34. Apply function to data frame

35. Randomly shuffle data

36. Percent change

Useful for time series data

ex: price of BTC over 3 days [30000, 33000, 31000] -> [NaN, 0.1, -0.06]

37. Assign rank

38. Check memory usage of data frame

39. Explode list values to multiple rows

40. Convert smaller categories to “Others”

Hope you found these code snippets useful in your own data work!

If you want more, check out these resources below

Thanks for reading!

Liked this article? Here are some articles you may enjoy:

Be sure to follow the bitgrit Data Science Publication to stay updated!

Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!

Follow Bitgrit’s socials 📱 to stay updated on workshops and upcoming competitions!

--

--

--

We’re democratizing AI with our online competition platform — bitgrit.net. On our publication, we publish only high-quality data science-related topics. Become a writer by emailing us at: info@bitgrit.net

Recommended from Medium

Machine Learning on Sound and Audio data

Machine Learning on Sound

The streaming model, and how to estimate the most frequent elements with the Misra-Gries algorithm.

In Case You Missed It

Paxata: A Pastel Panacea for the Bedraggled Data Consumer

Data Science Internship

Factor Analysis in Red Wine Quality

MAE vs MSE Error Metrics

How I would explain building “LightFM Hybrid Recommenders” to a 5-year old!

Get the Medium app

Benedict Neo

Benedict Neo

+1.5M views | Connect 👉 https://linkedin.com/in/benedictneo/

More from Medium

Guide to creating interactive visualizations in Python

Python GroupBy Tricks

12 Python Built-In Functions for Data Science and Analytics

Data cleaning in python using pandas