Become a Pro at Pandas, Python’s data manipulation Library

Julien Kervizic
Hacking Analytics
7 min readMay 9, 2019


The pandas library is the most popular data manipulation library for Python. It provides an easy way to manipulate data through its data-frame API, inspired by R’s data frames.

Photo by Damian Patkowski on Unsplash

Understanding The pandas library

One of the keys to getting a good understanding of pandas, is to understand that pandas are mostly a wrapper around a series of other python libraries. The main ones being Numpy, SQL alchemy, Matplot lib, and openpyxl.

The core internal model of the data frame is a series of NumPy arrays, and pandas functions, such as the now deprecated “as_matrix” function, which return results in NumPy’s internal representation.

Pandas leverages other libraries to get data in and out of data-frames. SQL Alchemy, for instance, is used through the read_sql, and to_sql functions while openpyxl and xlsx writer are used for read_excel and to_excel functions.

Matplotlib and Seaborn, in turn, are used to provide an easy interface, to plot information available within a data frame, using a command such as df.plot()

Numpy’s Panda — Efficient pandas

One of the complaints that you often hear is that Python is slow or that it is…



Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com