The Pandas Library in depth (part 1) for data analysis and machine learning

Umakant_Shinde
Analytics Vidhya
Published in
5 min readOct 7, 2020

In this post I’m explaining how to accessing data and data manipulation using pandas library.

In the previous post, you got familiar with the pandas library and with basic functionalities that it provides for the data analysis. pandas is a library specialized for data analysis, so you expect that it is mainly focused on calculation and data processing.

Without pandas library data processing is not possible for solving problem in machine learning. We know that pandas library is used for accessing data from different ways.Here I’m explaining how we access data from required ways

Accessing data from

  1. comma-separated values (csv)
  2. Hypertext Markup Language(Html)
  3. Structured Query Language(SQL)

1.comma-separated values (csv):-

In machine learning csv file is used for accessing data and solve problem and to store data in CSV file. Data are reported in tabular format and usually comma separated values is used for to store tabular format data.

So this type of file is the most common source of data and actually even easier to transcribe and interpret.

pandas provides a set of functions specific for this type of file.

  • read_csv
  • read_table
  • to_csv

Here I’ve loaded data using read_csv () function and after that we can analysis using following functions:

  • data.describe()
  • data.keys()
  • data.features()

All above function are common and basic to check the data condition. If you wants to save data in csv than to use to_csv() function.

Simple example of how to use of to_csv() function

Here I’ve created three list and after that created dictionary.

2. Hypertext Markup Language(Html) :-

Today most of the data available on internet and these data accessing in various way in that on the html data access follow function .

the HTML format pandas provides the corresponding pair of I/O API functions.

• read_html()

• to_html()

To have these two functions can be very useful. You will appreciate the ability to convert complex data structures such as Data Frame directly in HTML tables without having to hack a long listing in HTML, especially if you’re dealing with the world web.

The inverse operation can be very useful, because now the major source of data is just the Web world.

In fact a lot of data on the Internet does not always have the form “ready to use,” that is packaged in some TXT or CSV file.

Very often, however, the data are reported as part of the text of web pages. So also having available a function for reading could prove to be really useful.

This activity is so widespread that it is currently identified as Web Scraping.

This process is becoming a fundamental part of the set of processes that will be integrated in the first part of the data analysis: data mining and data preparation.

to_html() :

You want to convert dataframe into html format than use to_html

It will automatically convert data frame into html and not require to put up html tags like <html></html> when you use to_html than it automatically get it.

to_html use

Above pic is example

read_html():

In this function when you access data from internet then use this function for loading data.

Here above pic using read html data are loaded but this data stored into list now it will converting into csv file.

Here, to_csv used for storing data in csv format:

Now loaded data into html format are stored in csv file. This is basic use of to_html,read_html

Here to_html means to store data in html from data frame that means all data frame stored in html using to_html and read_html is used for load data in data frame to use read-html when we use read-html function then data are loaded in list and these list are very big after that we convert into data frame.

And these data frame stored in csv format , here all pic above explaining this method .

3. Structured Query Language(SQL)

In business there are various format stored data and some data are stored in data base and these data to access follow some way , there are different ways to access data in database but some commonly used function I am explaining here and in many applications, the data rarely come from text files, given that this is certainly not the most efficient way to store data. The data are often stored in an SQL-based relational database, and also in many alternative NoSQL databases that have become very popular in recent times. For accessing database data you need to install sqlalchemy library

pip3 install sqlalchemy

on CentOS

pip3 install sqlalchemy

on ubuntu

pip3 install SQLAlchemy

Use this command

sqlalchemy library is a python library is used for accessing data from data base

SQLAlchemy is most famous for its object-relational mapper (ORM), an optional component that provides the data mapper pattern, where classes can be mapped to the database in open ended, multiple ways — allowing the object model and database schema to develop in a cleanly decoupled way from the beginning.

example of sqlalchemy library

db = sqla.create_engine(‘sqlite:///mydata.sqlite’)

next post I will explain data accessing on

  1. mongodb
  2. Microsoft Excel Files
  3. XML

Thanks for reading.

Read my previous post :-

https://medium.com/@utshinde999/machine-learning-basic-library-abstract-7440a3af0d8c

--

--

Umakant_Shinde
Analytics Vidhya

Computer Science Engineer. machine learning and data science . I’m trying to cover basic level to advance level topic in the data science domain