Demystifying the Mystical: My Foray into the World of AI

Week 4: From Numpy to Pandas. . .embedded within is Matplotlib

Mubbysani
ai6-ilorin
7 min readJan 26, 2020

--

It’s been four weeks now that AI Saturdays class started. AI6 brought the tremendous opportunity to folks in Ilorin to become AI geeks and the journey so far has been amazing. With ten weeks to go, the world of AI is bound to be demystified. And so we move ahead.

The previous week saw to Functions, Classes, and other necessary aspects in Python, but it seems there is more in stock in the Python world and with the keyword import, one could do more…more than one could ever imagine.

When a Python interpreter encounters an import statement, it imports the module given in the statement. But what is a module?

Python has a way of putting definitions in a file and using them in a script or in an interactive instance of the interpreter at any point. Such a file is called module. A module is capable of defining variables, functions, classes, and can also contain codes that can be run. The file name is the module name with the suffix .py appended and can be referenced at any point in the course of writing a program. Also, definitions from a module can be imported into other modules.

Python comes with a library of standard modules and one of such modules we are to work with is called Numpy. Numpy was developed by Travis Oliphant and is intrinsically integrated with Python.

Numpy? Who cares?

The Python programming language attracted the attention of the scientific and engineering community early on, and there was a need for it to allow numerical operations, hence Numpy. Numpy which stands for Numerical Python adds support for large, multi-dimensional arrays and matrices, along with large collections of mathematical functions to operate on these arrays. Numpy in Python gives functionality comparable to MATLAB and allows users to write fast programs as long as most operations work on arrays or matrices instead of scalar. Numpy package is imported using the syntax below. Although, Numpy is usually installed on Python, but the Anaconda distribution comes with Numpy and several other packages.

importing Numpy

After running the code above, an instance of an array can be constructed in any dimension as required by the program that is being written. Elements in Numpy arrays are accessed by using square brackets and can be initialized by Python lists, albeit nested. Arrays can also be created with the use of various data types such as lists, tuples, etc.

An example of one-dimensional array
arrays of multiple dimensions

After an array or arrays have been created, some basic mathematical and binary operations can be performed on a particular array or the combination of arrays.

sum of array x
multiplication of two arrays x and y

Shape is essential in everything and its importance is not left out in Numpy too. The shape property is usually used to get the current shape of an array i.e it tells the size of each dimension.

shape attribute

And there is also the reshape attribute that gives a new shape to the array without changing its data.

Is there more to Numpy? It seems there is more to it than I envisioned when I heard the word. And so Numpy went ahead to borrow a word from BBC(British Broadcasting Corporation). So, what is Broadcasting?

Broadcasting describes how Numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting usually leads to efficient algorithm implementations. Broadcasting of two arrays make use of some rules in its implementation, among which are:

  1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
  2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
  3. The arrays can be broadcast together if they are compatible in all dimensions.
  4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
  5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension.
Broadcasting arrays
Broadcasting operations

Normally, arrays with different sizes cannot be added, subtracted, or generally be used in arithmetic. A way to overcome this is to duplicate the smaller array so that it has the dimension and size as the larger array. In the examples above, the arrays a and v are of different dimensions, but Broadcasting made it possible for addition and subtraction to be performed on the two arrays.

There is even more to Python. . .

Python is a huge thing and one of its greatest benefits is that it allows visualization of huge amounts of data in an easily accessible and digestible format. And that’s where matplotlib comes on stage.

Matplotlib, as the name implies, is an amazing virtualization library in Python that allows 2D plots of arrays. With just a few lines of code, it is possible to generate plots, histograms, power spectra, bar charts, error charts, etc.

The first step in using matplotlib is to import the library into the program with the keyword import.

importing matplotlib

Basic plots in Matplotlib:

plot 1

In the above example, three lines of code generated the plot shown — a sine curve with coordinates x and y.

sine and cosine plot

Histogram Plot

Histogram plotting

And after Matplotlib, we move swiftly into the last part of week 4 which is to get acquainted with a bear native to south-central China (Panda). At least that was how it sounds until I heard the ‘s pronounced by the instructor. Pandas he called. . . Panda( a bear). . . s — a Python library. Data scientists go gay at the sight of Pandas. Do you know why? Because it is a high-level data manipulation tool.

Pandas like Numpy is a Python library that provides extensive means for data analysis. Data scientists often work with data stored in table formats like .csv, .tsv, or .xlsx. Pandas makes it very easy and convenient to load, process, and analyze such tabular data using SQL-like queries. In conjunction with Matplotlib, Numpy, and Seaborn, Pandas provides a wide range of opportunities for visual analysis of tabular data. And the first step is easy easier, like the one taken when importing other libraries I have mentioned earlier. Take a look

importing pandas

After importing the library, one can proceed to read the data by using the keyword read followed by the format of the data i.e csv, tsv, or xlsx — as in read_csv

dataset from a telecom company

Afterward, numerous operations can be performed on the data. The command shape can be used to know the actual length and dimension of the dataset.

shape of the data

From the output above, it is known that data contains 3333 rows and 20 columns.

The names of the columns in the data can be printed by calling the columns function print(df.columns), print is a function that does what it is named after, and df above is the name given to the dataset.

printing columns

info() method can also be used to print some general information about the dataset

general information of the dataset

There are some other methods that can be evoked and used in multi-dimensional ways, an example of another method is describe. The method shows the basic statistical characteristics of each numerical feature.

‘‘ And Pandas stretches ahead ’’ says the Instructor as he rounded up.

Gradually, we are moving into Machine Learning, stay with us in the journey.

--

--

Mubbysani
ai6-ilorin

I write anything that catches my fancy and play around with words