7 Python Libraries for ML Dev in 2020

Aryan Kargwal
SRM MIC
Published in
5 min readSep 15, 2020
source:-realpython.com

As a beginner to Machine Learning, you might often feel overwhelmed by the amount of content to study. I write this article in an attempt to help you with your potential dilemma, and share what I have experienced in my last year of learning to be a competent ML Developer.

Machine Learning is one of the fastest growing areas of computer science. Having been around for over 70 years ML has seen many pioneers aided with their set of languages ranging from LISP in the late 50s to Prolog in the 70s to C to R and what not. Lately with the groundbreaking work by modern programmers, Python stands as the most preferred language for Machine Learning and AI.

source: fiverr.com

Package: it is a namespace which is basically a compilation of the relevant commands and classes relevant to the application.

Libraries: it is a collection of such packages.

As of today Python is backed by 235,000 packages, which compared to its counterparts is the highest. Let us now get into the 7 libraries I think a beginner should look into:-

  1. NumPy
  2. Pandas
  3. Matplotlib
  4. Keras
  5. scikit-learn
  6. TensorFlow
  7. PyTorch

NumPy

NumPy, introduced in 2005 created by Travis Oliphant, is one of the most preferred library in Python to deal with complex matrix and array operations. It is equipped with a vast arsenal of high-level mathematical operations.

In action:

import numpy as np# Creating two arrays of rank 2x = np.array([[1, 2], [3, 4]])y = np.array([[5, 6], [7, 8]])# Creating two arrays of rank 1v = np.array([9, 10])w = np.array([11, 12])# Inner product of vectorsprint(np.dot(v, w), "\n")# Matrix and Vector productprint(np.dot(x, v), "\n")# Matrix and matrix productprint(np.dot(x, y))

Output:

219 

[29 67]

[[19 22]
[43 50]]

High-end libraries like TensorFlow uses NumPy internally for manipulation of Tensors.

Resources:

Video Tutorial: https://www.youtube.com/watch?v=QUT1VHiLmmI

Documentation:https://numpy.org/doc/

Cheat sheet :https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf

Pandas

Pandas, introduced in 2008 created by Wes McKinney, is one of the preferred library used for data manipulation and analysis. With the rise of Big Data and the ever increasing datasets, Pandas play an important role in pre-processing this data for further use by offering data structures and operations to manipulate these datasets.

In action:

# importing pandas as pdimport pandas as pddata = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],"capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],"area": [8.516, 17.10, 3.286, 9.597, 1.221],"population": [200.4, 143.5, 1252, 1357, 52.98] }data_table = pd.DataFrame(data)print(data_table)

Output:

Resources:

Video Tutorial: https://www.youtube.com/watch?v=vmEHCJofslg

Documentation: https://pandas.pydata.org/docs/

Cheat sheet: http://datacamp-community-prod.s3.amazonaws.com/dbed353d-2757-4617-8206-8767ab379ab3

Matplotlib

Matplotlib, introduced in 2003 created by John D. Hunter, is a very popular python based visualizer library. It provides and Object Oriented API for plotting a wide variety of 2d and 3d graphs. With a vast array of functions it is able to not only plot graphs but can be used for Images, histograms, contouring and path generation.

In action:

import matplotlib.pyplot as plt

data = {'apple': 10, 'orange': 15, 'lemon': 5, 'lime': 20}
names = list(data.keys())
values = list(data.values())

fig, axs = plt.subplots(1, 3, figsize=(9, 3), sharey=True)
axs[0].bar(names, values)
axs[1].scatter(names, values)
axs[2].plot(names, values)
fig.suptitle('Categorical Plotting')

Output:

Resources:

Video Tutorial: https://www.youtube.com/watch?v=0P7QnIQDBJY

Documentation: https://matplotlib.org/contents.html

Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf

Keras

Keras, introduced in 2015 and created by Francois Chollet, is a high level neural network API capable of working on top of TensorFlow. It is designed to enable faster experimentation of deep neural networks giving us the freedom of changing hyper-parameters on go. It can run seamlessly on both CPU and GPU.

Resources:

Video Tutorial: https://www.youtube.com/watch?v=qFJeN9V1ZsI

Documentation: https://faroit.com/keras-docs/1.2.0/

Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf

scikit-learn

scikit-learn, introduce in 2007 and created by David Cournapeau, in one of the most popular python library for classical ML algorithms. Built on top of the 2 pioneers in ML libraries viz. NumPy and SciPy, scikit-learn house almost all the supervised and unsupervised learning algorithms defining whom can be a tedious task when you want to change your optimizer or regularization on go.

Resources:

Video Tutorial: https://www.youtube.com/watch?v=pqNCD_5r0IU

Documentation: https://scikit-learn.org/stable/user_guide.html

Cheat sheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf

TensorFlow

TensorFlow, introduced in 2015 created by the Google Brain team, is probably the most famous open source python library used for Machine Learning. Aided with the other previously mentioned libraries, TensorFlow can run deep neural networks that are further used in many of the AI applications we see around us. Originally designed for Google’s internal working, TensorFlow is backed by a vast community which makes troubleshooting an easier task.

In action:

In the following repository I have implemented classification on the MNIST dataset using all the prior libraries with TensorFlow.

Resources:

Video Tutorial: https://www.youtube.com/watch?v=tPYj3fFJGjk

Documentation: https://www.tensorflow.org/api_docs/python/

PyTorch

PyTorch, introduced in 2016 primarily developed by Facebook’s AI research lab, is an open source Machine Learning library based on the Torch library. It comes with an extensive choice of tools and libraries that supports on Computer Vision, Natural Language Processing (NLP) and many more ML programs.

PyTorch has been used to build several modern Deep Learning softwares, including but not limited to Tesla AutoPilot, Uber’s Pyro, PyTorch Lightning and much more.

In action:

In the following repository I have implemented classification on the MNIST dataset using all the prior libraries with PyTorch.

Resources:

Video Tutorial: https://www.youtube.com/watch?v=GIsg-ZUy0MY

Documentation: https://pytorch.org/docs/stable/index.html

Conclusion

I hope this article is helpful, and I was able to guide you through the scripts of Machine Learning.

--

--