Machine Learning in 30 Sessions | Part 1

Shahrokh Shahi
4 min readJun 27, 2020

--

Okay…. Day one…

Before starting, I really do hope that I will not give up what I have started today (finger crossed)

How I feel the road is… not straight but not too hilly :)

The initial and foremost step…

For the first day, I just want to spend a few hours on reviewing the libraries and packages which are necessary in this road. At the beginning of each, I usually spend a considerable amount of time to prepare my setup an re-check the version of the libraries (for instance, TensorFlow which currently is important to know the version of the library installed on your machine because there is a huge (in my opinion) difference between version 1.0 and 2.0)

Therefor, for the first day, let’s quickly review what are the required libraries. I list what I need in this (let’s call it) reviewing project:

  • Python3 — There is no doubt and argue that python is generally used for all tasks in Machine Learning, and there are tons of articles about why it is so popular for such purpose. In this project, I want to use Python3 (Python2 is now officially discontinued, so let’s just focus on the future:D)
  • Data Analysis and Visualization Libraries. Let’s just keep everything neat and just focus on the following four libraries: NumPy, SciPy, Pandas, Matplotlib, and Seaborn
    Here is a brief description of these four libraries:
    (i) NumPy: A fundamental package needed for scientific computing with Python. Basically, it adds support for large multi-dimensional arrays and matrices with a collection of high-level functions to operate on them. That’s why I think NumPy is a package that try to provide all MATLAB-like functions in Python. All these functionality are made possible by its ndarray (stands for n-dimensional array) data structure. NumPy can provide general container of data with arbitrary datatypes.
    (ii) SciPy: Another package for scientific computing which is mainly used for optimization, linear algebra, integration, image processing, differential equation solvers and some other related tasks. The key part is that the basic data structure used in SciPy operation is the ndarray which is provided by NumPy.
    (iii) Pandas: A handy library for data manipulation and analysis which allows importing data from various file formats such as CSV, JSON, SQL, Excel, MATLAB mat-file, etc. This library can be employed to reshape, merge, and clean data.
    (iv) Matplotlib is a plotting library for Python and in practice used with NumPy. Technically, it provides object oriented API for embedding plots. The syntax is very similar to MATLAB, and it is good to know that it is also used in SciPy.
    (v) Seaborn is a library exclusively developed for making statistical graphics in Python. It is built on top of Matplotlib and closely integrated with Pandas dataframe data structure. This is very helpful at the beginning of a ML project to visualize the data and have a general sense about the dataset.
  • Machine Learning Libraries. The widely used ML libraries are TensorFlow (and TensorBoard), Scikit-learn, PyTorch, and Keras. Let’s quickly review them here and decide which ones should be used in this project:
    (i) TensorFlow: My favorite! It is a symbolic math library for conduction dataflow and differentiable (automatic differentiation) programming paradigms. This open-source library is developed by the Google Brain team for Google use in general and it’s initially been released in 2017 (version 1.0.0) and the second version (2.0) released in 2019.
    TensorBoard: In simple word, it is TensorFlow’s visualization toolkit, enabling you track metrics such as loss and accuracy over time.
    (ii) Scikit-learn: It provides various ML methods such as classification, regression and clustering algorithms, and it is fully integrated with Matplotlib, NumPy, Pandas, and SciPy
    (iii) PyTorch: It is an open-source ML library, mainly used (as far as I saw so far) for computer vision and NLP applications which is primarily developed by Facebook’s AI Research Lab. A number of Deep Learning software are built on top of PyTorch, including Tesla and Uber.
    (iv) Keras: Running on top of TensorFlow and Microsoft Cognitive Toolkit and it is mainly focused on deep learning networks and has support for convolutional and recurrent neural networks. In 2017, TensorFlow team decided to support Keras in TensorFlow’s core.

Okay… I think that is more than enough about the overview of the libraries that I want to review their main capabilities and features in this reviewing project.

How to install all the packages…

The last part of the first day’s review is just have an agreement on the version of the libraries that we will use from now on. Let’s install these libraries, regardless of being already installed or not and regardless of the version of the libraries that are already installed:

pip3 install numpy==1.18.4 --ignore-installed
pip3 install scipy==1.4.1 --ignore-installed
pip3 install pandas==0.25.3 --ignore-installed
pip3 install matplotlib==3.0.3 --ignore-installed
pip3 install seaborn==0.10.1 --ignore-installed
pip3 install tensorflow==1.13.1 --ignore-installed
pip3 install tensorflow-estimator==1.13.0 --ignore-installed
pip3 install tensorboard==1.13.1 --ignore-installed
pip3 install torch==1.5.1 --ignore-installed
pip3 install sklearn

To import and check the installed packages:

Today’s code on my GitHub repository: LINK

Okay… That was the first day review (which was almost nothing… but at least we can call it a start:D)

References

As promised and I am fully committed to cite my references, at the end of each day’s story, I will try to mention all the references I used. in today’s writing I mainly used Wikipedia pages of each library and their official websites and repositories on GitHub.

--

--

Shahrokh Shahi

PhD in Computer Science — Computational Science and Engineering (CSE) at the Georgia Institute of Technology 🔗 www.sshahi.com