From Head to Tail: Mastering the 8 Most-Used EDA Functions in Pandas

Published in

Women in Technology

4 min readNov 2, 2023

Pandas, one of the famous libraries in Python, has become an indispensable weapon for anyone working with data; it is a must-have on his belt. Not too long ago, Python was declared the most popular programming language, suitable for various purposes, such as Artificial Intelligence, Data Analytics, Data Visualization, Programming Applications, Web Development, Game Development, Language Development, and SEO.

It's almost impossible for it not to be popular and in demand with that much influence.

Since most of my expertise lies in the realm of data, today we will explore the most important functions/methods we MUST execute when we begin our Python data project.

Let’s not waste any time. We should start gathering our weapons ⚔️

We all know that when dealing with datasets, the chances of receiving a fully cleaned and ready-to-use set of data are minimal. That’s why we need to master the most useful weapons when it comes to data cleaning in Python (using the Pandas library).

But before that, we need a dataset to work on. The first step is importing the dataset into our environment (i like using Jupyter Notebook). To do this, we simply use the read_csv command, assuming your dataset is in CSV format.

After settling on the research field, we need to focus on mastering the necessary skills.

1. head() — this method will show you a default of 5 rows from your dataset.

You can change the default value by specifying the desired value within the parentheses.

2. shape — this helps you determine the size of our enemy; For example, the Netflix dataset i used for an exploratory analysis (discussed in this article) had 8807 rows and 12 columns.

3. info() — we also want to know the ‘enemy’s’ allies; this can be done by running the info() method. It provides details about all the columns, including their names, the specific data types, and the count of non-null values in each column.

If your interest is only in seeing the columns’ names, you can access the columns() attribute.

Tip: To see the names in a more readable format use tolist()

But if your interest is only in seeing the columns’ data type you can execute the dtypes attribute.

4. describe() — used for generation descriptive statistics, including mean, standard deviation, and more.

5. duplicated() — with this, we can identify how many of the ‘enemy’s’ allies share the same nature more than once, or simply put, if we have the same value multiple times.

In the example above, we don’t have any duplicates 🥳

6. isna() — curious about how powerful our enemy is? Let’s find out if there are missing allies or empty battle positions in its team.

In the example above, i used sum() to aggregate the missing values for each column.

7. nunique() — some of the allies must have unique skills, but how can we find out? Easy-peasy, using the nunique() function helps us by counting unique values in each column.

8. tail() — works almost the same way as head(). The primary distinction is that while head() shows you the first n rows, tail() returns the last n rows of the dataset.

Conclusion

By employing these 8 essential functions (aka weapons), we’re able to decipher and comprehensively understand our dataset. Armed with this knowledge, we’re better equipped to navigate the path forward, ensuring a clean and analysis-ready dataset.

You can find the complete project built on the Netflix dataset in my GitHub repository.

Thank you so much for your support, it means a lot to me.

If you found this article interesting and helpful, you have the option to support my work here ☕😊

P.S.: Visit my medium and embark on an exciting journey of discovery today. Happy reading!

From Head to Tail: Mastering the 8 Most-Used EDA Functions in Pandas

Conclusion

Written by Luchiana Dumitrescu