Nerd For Tech
Published in

Nerd For Tech

Essential Python libraries for Data Science

Photo by Myriam Jessier on Unsplash

1. Pandas

Pandas is a Python package that is mainly used for data analysis and manipulation. It provides different data structures and operations for the manipulation of numerical tables and time series.

  1. Series (1D): The series is a one-dimensional data structure. It can be considered as a 1D labeled array that is capable of holding data of any type. For example, a column of a table.
  2. Dataframe (2D): It is a two-dimensional data structure. It can be considered as a 2D labeled array. For example, a table with both rows and columns.
  1. Pandas documentation: Click here
  2. Pandas cheat sheet: Click here

2. NumPy

NumPy stands for Numerical Python. It is one of the most fundamental packages of Python which is required for scientific computing. It provides a multi-dimensional array object and various derived objects like the masked arrays and matrices.

  1. Numpy documentation: Click here
  2. Numpy cheat sheet: Click here

3. SciPy

The SciPy library is a part of the core SciPy packages that together form a SciPy stack. There is a difference between the SciPy stack and the SciPy library. The SciPy stack is a combination of tools like NumPy, Pandas, SciPy, Matplotlib, IPython, SymPy, etc. Whereas the SciPy library is a combination of modules for linear algebra, statistics, optimization, integration and interpolation. Its main functionality is built upon NumPy and its arrays and hence make significant use of NumPy.

  1. SciPy documentation: Click here
  2. SciPy cheat sheet: Click here

4. Matplotlib

It is a data visualization library for Python. It is also a part of the SciPy stack. Matlplotlib provides static, animated and interactive visualization and object-oriented API for embedding the plots into the applications.

  1. Line plots
  2. Bar charts and histograms
  3. Scatter plots
  4. Area plots
  5. Pie charts
  6. Contour plots
  7. Stem plots
  8. Quiver plots
  9. Spectrograms
  10. Stream Plots
  1. Matplotlib documentation: Click here
  2. Matplotlib cheat sheet: Click here

5. Seaborn

Seaborn is also a data visualization library for Python which is based on Matplotlib. In other words, it is an advanced version of Matplotlib that provides a high-level interface for drawing attractive and informative statistical graphics like heat maps.

  1. Seaborn documentation: Click here
  2. Seaborn cheat sheet: Click here

6. Plotly

Plotly is an open-source browser-based interactive graphing library for Python. It can be used for creating different types of charts like scientific charts, 3D graphs, statistical charts, SVG maps, and financial charts, etc. Plotly also provides the feature of sending data directly to the cloud servers.

  1. Plotly documentation: Click here
  2. Plotly cheat sheet: Click here

7. Scikit Learn

Scikit Learn is a Machine Learning library for Python which was developed as a Google Summer of Code project. It contains various tools for Machine Learning and statistical modeling. The benefit of using Scikit learn is that the code for algorithms need not be written from scratch hence it is more effective, time-saving and reliable.

  1. Classification
  2. Regression
  3. Clustering
  4. Dimensionality Reduction
  5. Model Selection
  6. Data Preprocessing
  1. Scikit Learn documentation: Click here
  2. Scikit Learn cheat sheet: Click here

8. TensorFlow

It is an open-source platform for Machine Learning. TensorFlow helps in building the Deep Learning models and helps the researchers push the state-of-the-art in ML and the developers can easily build and deploy ML-powered applications. Using Tensorflow the developers can also easily create large-scale neural networks with numerous layers using the data flow graphs.

  1. TensorFlow documentation: Click here

9. Keras

Keras is a Deep Learning API of Tensorflow written in Python. It is used to provide a Python interface for artificial neural networks. It makes statistical modeling and working with text and images a lot easier due to its simplicity and ease to learn and use. It increases productivity and effectiveness because it allows trying more ideas quickly.

  1. Keras documentation: Click here
  2. Keras cheat sheet: Click here

10. Statsmodels

Statsmodels is a Python library that is used to estimate the statistical models, conducting statistical tests and statistical data exploration using its classes and functions. It also provides the plotting functions that are used for statistical analysis.

  1. Statsmodels documentation: Click here



NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store