Basic Python knowledge to start machine learning development.
In this post, I have tried to give a brief knowledge about the basics of python programming to start machine learning development. Also, have written on the python libraries used in machine learning.
Why Python for machine learning?
Python is a high-level, general-purpose, interpreted, object-oriented, programming language. Python is an easy programming language and comes up with some strong features and that’s why we use python for machine learning development. Some people are also using the R programming language. I am using python to develop the machine learning models. There are many python libraries are present that make things easy. We can come up with the desired outcome by using the libraries and with a small amount of code. The libraries we require even if we want to write the algorithms from scratch are NumPy and Pandas. We will use matplotlib library in upcoming posts to visualize the data. Also, there is a library called sklearn that having all machine learning algorithms.
NumPy is a library of python programming languages that adding support for large multi-dimensional arrays and matrices along with a large collection of mathematical functions that operate on arrays. It is much faster than python operations. If we want to multiply arrays, then NumPy takes less time as compared to normal Python.
It is a software library written for the Python programming language to manage the tables, columns of the data. It can work like SQL query language since it helps to fetch the specific data from the dataset. Also, it helps to import the dataset as a CSV, excel, txt file which we will use to train and test the model.
Let’s start with the coding part and will see the basic code in python that we are going to use during each model development. To train the model we require data and the data will be in CSV, Excel, or txt file. We have to import the data to train and test the model. Let’s see how to import the file and use the data using a python library called Pandas.
- Import Pandas and import the diabetes dataset
Here, in the first line, I have imported the Pandas library. In the second line, have used the read_csv() function to read the CSV file and stored it into a variable called df. The read_csv() function is in the Pandas library. In the last line, have printed all details of the first 5 entries. The head() function will return the first 5 records from the dataset.
- Check the information of our dataset
The info() function will give the information about all attributes of the dataset. Attributes mean columns, and records mean row. This function provides information such as column index, column name, non-null records, and data type of columns, etc.
- Now, describe the dataset with describe
The describe() method is used to describe the dataset and gives the result as like the above image.
- Slice columns from the dataset
- We can take a slice of data provided using iloc[:,:]. The first : responsible for rows and the second for attributes. In the above piece of code, we have sliced the data of all records of the first 5 columns from column 0 to column 4.
We will require these operations while developing machine learning models. We will require these operations while separation input and output data or attributes.
I have given a brief on the libraries of python that are useful to develop the machine learning models. Also, written some things on the basic operations that are required during the training of the machine learning model.