Pandas Implementation of Machine Learning course at Hackveda

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
Library Feature
- DataFrame object for data manipulation with integrated indexing.
- Tools for reading and writing data between in-memory data structures and different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of data sets.
- Label-based slicing, fancy indexing, and subsetting of large data sets.
- Data structure column insertion and deletion.
- Group by engine allowing split-apply-combine operations on data sets.
- Data set merging and joining.
Pre-Requisites
Basic knowledge of dictionaries in python programming.
Basic programming syntax of python
So to start with Pandas library we first need to Import it.

Series
Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.
Syntax : pandas.Series( data, index, dtype, copy)
Parameters:
1.data data takes various forms like ndarray, list, constants
2.index Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
3. dtype dtype is for data type. If None, data type will be inferred
4. copy Copy data. Default False.

Data Frames
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
Some key features are :
1. Potentially columns are of different types
2. Size — Mutable
3.Labeled axes (rows and columns)
4.Can Perform Arithmetic operations on rows and columns
Syntax : pandas.DataFrame( data, index, columns, dtype, copy)
Parameters:
data — data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
index — For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no index is passed.
columns — For column labels, the optional default syntax is — np.arrange(n). This is only true if no index is passed.
dtype — Data type of each column.
copy — This command (or whatever it is) is used for copying of data, if the default is False.

In the first step i created a dictionary ‘mydata’ and then converted into a dataframe.
Reading a CSV file using Pandas
Importing data is the first step in any data science project. Reading a csv ( comma separated values) is hence important.

In this the location where the CSV is present in your Machine is given as a parameter to read_csv ( ) function.
Reading a text file using Pandas
Sometimes CSV is not available and we have Text files. In text files the separation between two columns is used as Tab spaces. These files can also be read using pandas. In order to so we use another parameter called Separator.

Reading an Excel file using Pandas
We can also read Excel file using pandas, in order to do so we use function called read_excel( ) and specify the address of the excel file as its parameter.

Sometimes there are multiple sheets present in the same excel. we can also read it directly by giving the name of the Sheet in the parameter itself. I have created a Sheet2 in the same excel file with some dummy data.

Reading a CSV from a URL
Sometimes we have databases of huge size crossing 10Gb of data and they are present on cloud or any other site. So we can directly send a request to the URL and read the csv instead of downloading it on our machine.

Performing some actions on data after reading it as a data frame
First let’s read a csv file. I am performing some functions on data set called ‘MTCARS’

Suppose i want to print first few lines of the data set.
So to do we use a function called head( ) and give the number and lines we want to print. and similarly if you want to print few lines of data set from below we use tail( ).

Now let us suppose we need to know all the columns of the data set. So for that we use dataframe.columns

To know all the data types of the column we can use dataframe.dtype
This will show us all the data types that are present in the dataset.

To know the unique value present in a column we can do something like this
dataframe[‘column name’].unique
this will return all the unique values present in the column

Resource
For more Info : http://www.hackveda.in/campus.php?campusno=online_ml_course_demo_2018