ON PANDAS

Become a Pro at Pandas, Python’s data manipulation Library

Julien Kervizic
Hacking Analytics
7 min readMay 9, 2019

--

The pandas library is the most popular data manipulation library for Python. It provides an easy way to manipulate data through its data-frame API, inspired by R’s data frames.

Photo by Damian Patkowski on Unsplash

Understanding The pandas library

One of the keys to getting a good understanding of pandas, is to understand that pandas are mostly a wrapper around a series of other python libraries. The main ones being Numpy, SQL alchemy, Matplot lib, and openpyxl.

The core internal model of the data frame is a series of NumPy arrays, and pandas functions, such as the now deprecated “as_matrix” function, which return results in NumPy’s internal representation.

Pandas leverages other libraries to get data in and out of data-frames. SQL Alchemy, for instance, is used through the read_sql, and to_sql functions while openpyxl and xlsx writer are used for read_excel and to_excel functions.

Matplotlib and Seaborn, in turn, are used to provide an easy interface, to plot information available within a data frame, using a command such as df.plot()

Numpy’s Panda — Efficient pandas

One of the complaints that you often hear is that Python is slow or that it is…

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com