ON PANDAS

Become a Pro at Pandas, Python’s data manipulation Library

Published in

Hacking Analytics

7 min readMay 9, 2019

The pandas library is the most popular data manipulation library for Python. It provides an easy way to manipulate data through its data-frame API, inspired by R’s data frames.

Understanding The pandas library

One of the keys to getting a good understanding of pandas, is to understand that pandas are mostly a wrapper around a series of other python libraries. The main ones being Numpy, SQL alchemy, Matplot lib, and openpyxl.

The core internal model of the data frame is a series of NumPy arrays, and pandas functions, such as the now deprecated “as_matrix” function, which return results in NumPy’s internal representation.

Pandas leverages other libraries to get data in and out of data-frames. SQL Alchemy, for instance, is used through the read_sql, and to_sql functions while openpyxl and xlsx writer are used for read_excel and to_excel functions.

Matplotlib and Seaborn, in turn, are used to provide an easy interface, to plot information available within a data frame, using a command such as df.plot()

Numpy’s Panda — Efficient pandas

One of the complaints that you often hear is that Python is slow or that it is…

ON PANDAS

Become a Pro at Pandas, Python’s data manipulation Library

Understanding The pandas library

Numpy’s Panda — Efficient pandas

Written by Julien Kervizic