So Numpy and Pandas is the Foundation to Data Science — AI Saturdays Ibadan

Olanrewaju Ahmed
Aug 31, 2018 · 9 min read

We had fun, that is very correct. We had a great time on our third and fourth session of AI Saturdays. We are rolling out a combined report of the two Saturdays in one report. The feed back have been very encouraging both within and outside the University, a lot of people have been very eager to join a meetup where they will get to have a firm foundation of Data Science, Machine Learning, Artificial Intelligence, Deep Learning. We had more participants in the Session One which is mainly for the beginners and still in the spirit of the AI Saturdays (nurture.ai) curriculum, the recommended resource material as part of the prerequisites is the course material available at Udemy titled Deep Learning prerequisites: Numpy Stack, it is actually a free course and available to all. But we had a better idea, for another course material by the same facilitator and also available on Udemy to Python for Data Science and Machine Learning boot camp.

A cross-section of participants at AI Saturdays Ibadan (Week 3)

Feed back from Participants

The feedback was awesome, as a number of participants were so happy to see the things they could do in with Numpy and Pandas. Below are some of the feedback from participants

Mrs Kayode Aderinsola (Ph.D candidate) did a summary of the lecture on NumPy and Pandas

Mrs Kayode Aderinsola seriously at work

The training in AI-Saturdays revealed these new packages as most commonly used globally in computing profession in Machine Learning of today.

The two packages have emerged to be necessary archives documentation in computer application tools in Machine Learning for any systematic computation in ML, in Python for this new innovation in Artificial Intelligent. The two packages has been of interesting and easy to learn software tool in AI. Both were originated from Python software.

NumPy is a platform in Python used for Technical Calculating and the stuff. It can be used to perform different operation in AI. NumPy Array uses ndarray as a multidimensional array used to generate standard values of the same datatype. These arrays are indexed ie Sequences which starts with 0 (zero). Also in NumPy and Scipy functions same to manipulate data in structures ways. NumPy required the use of Pandas.

Pandas is an alternative to Numpy or Scipy which is a Python package providing fast, flexible and expressive data structures designed with interactive or labeled data which are both easy and natural to use. Pandas is powerful tools more than Excel package but more larger in rows and columns which can contains and it is majorly build on NumPy package. Its functions on essential high-level structure block for exploit useful real world data analysis in Python. Also it is an additional tool that provides a more efficient way of working with numerical and tabular data in Python. In Pandas package data and index are been used while in Excel it refers to the use of column and row. Pandas can also be used for data structures but on Numpy functions.

The implementation of the two packages (NumPy and Pandas) will be done with the new Anaconda Navigator which is the source of linking to these two packages in Jupyter notebook.

Israel Odeajo

So far on the class, I have understood that numpy and pandas are important in Data Science. I understood that on the Acaconda notebook Tools for implementing the data’s in array or list.

Also, Numpy talks more about mathematical Operations, Mean, Standard Deviation and all of the operations. In Numpy and Pandas, We understand so signature functions that are applicable to me such as Index, Data, data type and many others.

Also I understood more about how we could create a new data and index using Numpy & Pandas. They both work together. In a line of Code, Numpy & Pandas must be imported.

From the classes, We learnt how to analyze data’s, how to drop, how to create a new data (column or row). We learnt about Data Frame & Data sets also.

It was a time of high level of learning.

HUZEIN AFEEZ BABATUNDE

Huzein Babatunde having fun with Numpy and Pandas

This write-up is a brief description of the Python NumPy and Pandas. Thanks AI Saturday’s Ibadan for giving us this wonderful opportunity and most especially our facilitator (Mr Ahmed Olanrewaju) and also to our tutor for the last class @bro Ibrahim and @bro Mubarak, thanks so much.

To Start with let’s discuss about NumPy before going to Pandas.

NumPy: is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. NumPy is an open-source software and has many contributors.

Installation Process

Firstly, we were introduced to how NumPy is installed. It’s highly recommended to install Python using the Anacoda distribution to make sure all underlying distribution are all sync up with the use of a conda install. If Anaconda is already installed, NumPy can be installed by going to the terminal and type:

conda install numpy.

Once NumPy has been installed, it can be imported as a library:

We were also introduced to NumPy arrays, vectors, matrices and built-in methods. Let’s start with NumPy Arrays. NumPy array comes essentially in two way’s: mainly vectors and matrices. Vectors are conventionally known to be one dimensional array while matrices are also known to be two dimensional array(it’s also essential to also note that a matrix can still have one row and one column).

To make a NumPy array, you can just use the np.array() function. All you need to do is pass a list to it.

Don’t forget that, in order to work with the np.array() function, you need to make sure that the numpy library is present in your environment. The NumPy library follows an import convention when you import this library, you have to make sure that you import it as np. By doing this, you’ll make sure that other Pythonistas understand your code more easily.

Built-in methods

NumPy Array has a lot of ways to generate its built-in methods:

arrange: Return evenly spaced number’s within a given interval.

Generate an array of numbers between 0 and 8, but the last number will not be included

zeros and ones: this method generate array of zeros and ones.

Generate arrays of zeros and ones

eye: this method creates an identity matrix.

Using the np.eye implementation in Python

For other methods like np.full(), you also have to specify the constant value that you want to insert into array and the shape. Also for np.linspace(), it gives evenly spaced number over a specified interval and also we can grab the data type of object in array.

PANDAS

Pandas is a very powerful form of excel that has more features which can used for data analyses. The first main data type that we learn in Pandas is Series. Series is similar to NumPy array ,it is even built on top of the NumPy array data object. The major difference between a Series and a NumPy array is that a Series can be indexed by a label instead of a number notation, also it does not hold only number it can hold any Python arbitrary object.

Pandas can be imported as follows

import pandas as pd

Creating an example Series with and without an index:

Creating a pandas series

Dataframe is a bunch of Series object put together to share the same index.

The .seed that is after the import of NumPy statement(.np) is for the random number’s that’ll be generated to be the same if many people’s are working on the same data.

Missing data are the data that have no value to them. For example it’s not all the data field that people’s fill when you give them a questionnaire.

The missing data can be changed to another value: df.fillna(value=”Adisa”) and can also be removed: df.dropna(thresh=1)

Working with missing data

Data Input and Output: Pandas can read many using the csv.read_ method

Group by: The group by method allows you to group rows of data together and call aggregate functions. I think that’s all of my Summary.

Mrs Awokoya (B.Sc, M.Sc Computer Science)

Mrs Awokoya listening attentively during the session

The class was interesting!

Kudos to our tutors.

During the first session, we worked on pandas, which can be used to explore and manage worksheets in python and we finished all under the sub-topic. We were encouraged to do some read-ups in order to boost our understanding of the python package.

Assignment

Between 300–500words, write summary of what you learned from numpy and panda packages and how you can implement them.

To be submitted latest by Monday evening to a yet unspecified email.

The second session gave us insight to understanding the interest of Intel in data science, the tools available to increase efficiency when working with lots of data and the benefits awaiting students in the field. We were encouraged to endeavor becoming Intel Student Ambassador which also has lots of benefits (ranging from opportunities to attend fully funded conferences by Intel to having access to Intel products and services which are rarely accessible to the public). The document containing these information and more will be shared on this platform has promised.

Intel has consistently being making giant strides in terms of developing easy to use resources for students to understand and get involved in Deep Learning projects. Learning AI theory and follow hands-on exercises with our free courses for software developers, data scientists, and students. These lessons cover AI topics and explore tools and optimized libraries that take advantage of Intel® processors in personal computers and server workstations.

Machine Learning 501

Get an overview of the fundamentals of machine learning on modern Intel® architecture. (12 weeks) Get Started

Deep Learning 501

Learn the basic techniques and foundations of deep learning on modern Intel® architecture. (12 weeks) Get Started

TensorFlow* 501

Master the basics of using TensorFlow* with Intel® architecture. (8 weeks) Get Started

Intel AI Academy

Google Colab

We were encouraged to use platforms as Google colab more to do our machine learning tasks so as to reduce space consumption on our systems.

https://colab.research.google.com/

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud

Google Colab

We were reminded that we need to start seeing how we can write environment oriented, well structured and ‘attention calling’ researches. We brainstormed different topics we can work on and how data can be collected for the projects. We also deliberated on how groups can be formed for the research works. We were told that it is better to look into research areas that are interdisciplinary. Discussions on the research matter continue on this platform as concluded.

Session Two

The session two is really dragging foot and this is largely due to two main factors, the first is Andrew Ng seems to be quite technical for most participants.

We continued the lesson material from Andrew Ng deeplearning.ai and it promises to get better by the day

  • Binary Classification

We got our hands dirty by trying our first neural network form scratch

Implementing a Neural Network from Scratch in Python — An Introduction

Can you classify these two ladies as twins or not twins

What do you need to go through before coming for the hands on session

http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/

The code is available here, please viist https://github.com/dennybritz/nn-from-scratch

Meanwhile, the requirements to be installed is quite much and needed great internet connections

Pictures from Previous Meet Up

Group Picture on 1st Saturday of AISaturdays Ibadan
Olanrewaju Ahmed AI Saturdays Ambassador

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade