ON PANDAS
Become a Pro at Pandas, Python’s data manipulation Library
The pandas library is the most popular data manipulation library for Python. It provides an easy way to manipulate data through its data-frame API, inspired by R’s data frames.
Understanding The pandas library
One of the keys to getting a good understanding of pandas, is to understand that pandas are mostly a wrapper around a series of other python libraries. The main ones being Numpy, SQL alchemy, Matplot lib, and openpyxl.
The core internal model of the data frame is a series of NumPy arrays, and pandas functions, such as the now deprecated “as_matrix” function, which return results in NumPy’s internal representation.
Pandas leverages other libraries to get data in and out of data-frames. SQL Alchemy, for instance, is used through the read_sql, and to_sql functions while openpyxl and xlsx writer are used for read_excel and to_excel functions.
Matplotlib and Seaborn, in turn, are used to provide an easy interface, to plot information available within a data frame, using a command such as df.plot()
Numpy’s Panda — Efficient pandas
One of the complaints that you often hear is that Python is slow or that it is…