Pandas-The Best in the Business

Liston Tellis
Analytics Vidhya
Published in
4 min readDec 6, 2019
Nope! This isn’t about him!!

Introduction

There are few common things between the Kung-Fu-Panda and the Pandas library of Python. They are both fun to work with, handle messy situations in the best possible way and finally they will make you fall in love with them.

Pandas is one of the most powerful library of Python which is extremely useful in Data Science. It handles messy data with great precision. It is used extensively in data wrangling and is one of the most popular data manipulation tool available in the market right now.

Origin

Wes McKinney

Developer Wes McKinney came up with the idea of Pandas in the year 2008 when he was in need of a high performance, flexible tool to perform quantitative analysis on financial data. It is written in Python, Cython and C. The word “Pandas” has got nothing to do with the animal Panda. Pandas is a short form of “Panel Data”, which is an econometrics word for data sets.

Why is it so famous?

  • It is very useful in cleaning, transforming, and analysing the data.
  • It can handle data in various formats such as csv, excel, JSON, database and many more.
  • It helps to find statistical significant values such as correlation and distribution of data.
  • It handles missing values,outliers and all other type of errors in the data like a pro.
  • It prepares the data for visualisation, which can be done using matplotlib, seaborn, plotly and other python libraries.

How to install?

I Prefer Anaconda distribution when it comes to Data Science. Anaconda easily installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science using its own installer tool called Conda. Jupyter Notebook is one of the most powerful open source web application for developing and presenting data science projects. Pandas, Numpy, Scipy and other Machine Learning libraries can be imported in Jupyter Notebook. Even though you can code in Wing, emacs, Nano, Vim, PyCharm, IPython and many more, Jupyter Notebook is the best when it comes to Data Science.

Anaconda Distribution

Basics

Pandas is built on top of Numpy library of Python. Data in Pandas can be used to plot using Matplotlib, Plotly, Seaborn and other libraries. It can be used to implement Machine Learning algorithms using Scikit-learn and for statistical analysis using Scipy.

Pandas uses the concept of Series and Dataframe objects. A Series is basically a column and Dataframe is a multi dimensional table made up of Series. Dataframes are similar to dictionary in Python. Data in Dataframe is stored in the form of rows and columns. A Dataframe can be created from an existing data source such as SQL database, csv, excel, JSON file or even from the lists, dictionary, from a list of dictionary etc.

Other Python Data Science Libraries

Numpy

Numpy, also known as Numerical Python is a high-performance array-processing package of Python. It provides powerful data structures by implementing multi-dimensional arrays and matrices. It is the primary package for scientific computing in Python.

Matplotlib, Seaborn & Plotly

Matplotlib helps in basic visualisation and consists of bar, pie, line and scatter plots. Seaborn is built on top of Matplotlib and provides a variety of visualisation patterns. Plotly provides various types of interactive plots.

Scipy

Scipy is Python library for scientific and mathematical computing. It is built on top of Numpy. It has more scientific features when compared to Numpy. It contains modules for linear algebra, image processing, interpolation etc.

Scikit-Learn

Scikit-learn is the Machine Learning library of Python. It provides wide range of supervised and unsupervised learning algorithms. This is the fundamental library for modeling of the data.

Conclusion

Python is the most popular language for Data Science and Pandas is the fastest growing package in Python. In any Machine Learning problem, about 80% of the time is spent in data wrangling. So Pandas becomes extremely important in Machine Learning and when it is used along with other libraries such as Numpy, Matplotlib, Seaborn and Scikit-Learn, it just becomes the best in the business.

Please do share and clap if you like this article! Happy reading!

--

--