A Beginner’s Guide For The Pandas Library In Python

Published in

MS Club of SLIIT

5 min readMay 4, 2023

What is Pandas?

Pandas is an open-source library for data analysis in Python. It provides data structures and tools for working with structured data, such as tables and time series data. With Pandas, you can easily manipulate, filter, and visualize data, making it a valuable tool for data analysts, scientists, and engineers.

Installing Pandas

To get started with Pandas, you first need to install it. You can install Pandas using pip, the Python package manager. Open a terminal or command prompt and type the following command:

pip install pandas

After installation, you will need to import it to explore its many functionalities. The screenshot below shows how to do so on the Pycharm.

This imports the Pandas library and gives it an alias pd, which is a common convention in the Python community.

Main components of Pandas

In Pandas, there are two main components: Series and DataFrame. A Series represents a one-dimensional array or a single column of data. You can create a Series from a list, tuple, or NumPy array, and it can hold various data types such as strings, integers, floats, objects, and more.

Data Frame

The main data structure in Pandas is the DataFrame. A DataFrame is a two-dimensional table with rows and columns, similar to a spreadsheet. You can create a DataFrame in many ways, such as from a CSV file or a Python list or dictionary.

Here’s an example of creating a DataFrame from a Python dictionary:

In this example, we create a list of dictionaries, where each dictionary represents a single row of data. We then pass this list to the pd.DataFrame() function to create a DataFrame.

Simple Data Analysis with Pandas

The popularity of Pandas is due to its extensive collection of methods and functions, which are useful in data analysis and manipulation. I will only cover a few of these articles as an introduction, but you can explore further by referring to the documentation. By doing so, you can uncover more possibilities for data manipulation and analysis beyond what I have presented here.

Importing a dataset

Pandas support a wide range of data formats such as CSV, Excel, JSON, Geojson et cetera. below image shows a simple way of reading and importing a dataset.

using the below Dataset we can show data using inbuild methods. Here are some popular methods to display data,

#SAMPLE DATASET

.head()-it reads first top observation of the data.(Example-df.head(1))

.tail()-read the last bottom observations.(Example-df.tail(1))

.sample(#size)-read a random part of the dataset.(Example-df.size(3))

.info()- Get summary information of the dataframe.(Example-df.info())

.columns -Get column names.(Example-df.column)

.shape -This method returns a tuple that contains the number of rows and columns in the object. (Example-df.shape)

.dtypes()-This method is used to determine the data type of each column in a DataFrame.(Example-df.dtypes)

.describe()- To get summary statistics for each column(Example-.df.describe())

Manipulating Data

Pandas provides many methods for manipulating data in a DataFrame. For example, you can select specific columns using the square bracket notation:

names = df['name']

You can also filter rows based on certain conditions, such as age greater than 20:

older_than_20 = df[df['age'] > 20]

Adding/Removing columns

If you need to create a new column and add new data to the data set we can follow this method:

df.insert(4,'Degree',['Computer science','Business studies','Art','Cyber security'])

This code will create a column called “Degree” and add data according to the given order to that column.

To delete the selected column we can use:

df.drop('Degree', axis=1, inplace=True)

'Degree': This specifies the label of the column to be dropped. In this case, the label is "Degree".
axis=1: This specifies that we want to drop a column (as opposed to a row), and the value 1 refers to the axis of columns.
inplace=True: This specifies that we want to modify the original DataFrame df in place (without creating a new DataFrame).

This will delete the “Degree” column from Dataset.

Pandas also supports various operations for transforming and cleaning data, such as filling in missing values, merging multiple DataFrames, and grouping data by certain criteria.

So these are the main concepts that every beginner should know when starting to work with Pandas Library in Python. We covered the basics of Pandas, including installation, importing, creating DataFrames, and manipulating data.

you can explore more on this rich library more by reading up on the documentation and online tutorials.

Thanks for reading. Please share if you found it useful!

LINK TO GITHUB REPOSITORY FOR DATASETS USED AND SAMPLE CODES

GitHub - UdeeshaRukshan/Pandas_Library: A Beginner's Guid for the Pandas Library in Python

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

LINKS TO USEFUL RESOURCES

User Guide - pandas 2.0.1 documentation

The User Guide covers all of pandas by topic area. Each of the subsections introduces a topic (such as "working with…

pandas.pydata.org

The Ultimate Guide to the Pandas Library for Data Science in Python

Pandas (which is a portmanteau of "panel data") is one of the most important packages to grasp when you're starting to…

www.freecodecamp.org

SOCIAL MEDIA PROFILES

LINKEDIN-https://www.linkedin.com/in/udeesha-rukshan-852022217/