Python Packages for Data Science

Jose Dominguez
Digital Studio Stream
2 min readDec 15, 2020
Photo by Hitesh Choudhary on Unsplash

Scientific Computing Libraries

Pandas

Pandas is a fast, powerful, flexible, and easy to use an open-source library. It allows for high-performance data structures and analysis tools. It is developed over the Numpy package and it is centered around the DataFrame object.

NumPy

NumPy is an open-source library that enables numerical computing with Python. You are able to utilize n-dimensional array objects and mathematical operations.

SciPy

SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. It is based on Numpy and provides packages for computation.

Visualization Libraries

Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. One can easily develop charts such as histograms, plots, bar charts, scatter plots, etc.

Seaborn

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphs such as heat maps.

Algorithmic Libraries

Scikit-learn

Scikit-learn is an open source python library that provides machine learning packages. It features various classification, regression, and clustering algorithms. It allows for simple and efficient tools for prediction data analysis.

--

--

Jose Dominguez
Digital Studio Stream

Currently a student at Rutgers University — Newark, studying Applied Physics & Computer Science with a concentration in Mathematics. | Website: www.josedom.net