TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Top 10 Categories of Pandas Functions That I Use Most

Yong Cui
5 min readJul 19, 2022

--

Photo by Firmbee.com on Unsplash

People love to use Python because it has a versatile repository of third-party libraries for all kinds of work. For data science, one of the most popular libraries for data processing is Pandas. Over the years, because of its open-source nature, many developers have contributed to this project, making pandas powerful for almost any data processing job.

I didn’t count, but I felt like there were hundreds of functions that you can use with Pandas. Although I use maybe twenty or thirty functions frequently, it’s unrealistic to talk about them all. Thus, I’ll just focus on the 10 most useful categories of functions in this post. Once you get along with them well, they can probably address over 70% of your data processing needs.

1. Reading data

We usually read data from external sources. Depending on the format of the source data, we can use the corresponding read_* functions.

  • read_csv: use it when your source data is in the CSV format. Some notable arguments include header (whether and which row is the header), sep (the delimiter), and usecols (a subset of columns to use).
  • read_excel: use it when your source data is in Excel…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Yong Cui
Yong Cui

Written by Yong Cui

Work at the nexus of biomedicine, data science & mobile dev. Author of Python How-to by Manning (https://www.manning.com/books/python-how-to).

Responses (2)