3 Effective Methods to Slice Your DataSet using Pandas

Arunkumar N
Variablz Academy
Published in
4 min readOct 16, 2022
3 Effective Methods to Slice Your DataSet using Pandas (Credits: Aatomz)

Pandas is one of the most valuable libraries for data analysis tasks. In analyzing the data deeply, we often slice the data as required. Here we will see how we can slice a data frame using pandas in the 3 most efficient ways.

I have taken the ‘Indian food prices’ dataset from the data world. You can download it from here for reproducibility.

Before analyzing any dataset, we need to optimize the dataset by converting data types of columns to needed data types and here I am loading the optimized dataset.

Loading the dataset:

We can do so many different things by following data frame operations like filtering, indexing, and so on. Here I have focused only on slicing methods.

1. Slicing by iloc[]:

Here I have sliced the first 5 rows and first 5 columns of the data frame. we have done integer location-based slicing. The initial 0:5 represents the row index, and the final 0:5 represents the column index

Slicing middle n rows:

Here I have sliced the middle 5 rows. I have found the middle value of the index and used it inside the iloc.

I have chosen 5 rows from the middle row as a slicing point.

Here I have sliced the last 5 rows and chosen some specific columns using the lambda function. It will work without using the lambda function also.

2. Slicing by truncate():

Here I have sliced from 50 to 100 rows by using the truncate option. By default, it will truncate the rows if you don’t mention the axis.

Slicing by DateTime:

Let’s slice the data frame by setting the date as the index. Before doing that, I copied the data frame by its value so that the original data frame did not get affected while doing any operations.

I have truncated the data frame by date from the year 2021 to 2022

3. Slicing by loc[]:

we can do a lot of operations using loc[] especially filtering, filtering the data frame by any conditions. I think it is one of the key attributes of the Pandas data frame. It takes values based on row labels and column labels.

Here I have sliced from 100th to the 200th row, and I have sliced the column in steps. I need the date and price columns only, so I opted to ignore the 11 columns in between them. We can do it for rows too.

Slicing by DateTime:

I have sliced the data frame from the year 2000.

I have overlooked the prices from the year 1997 to 2000.

We can also slice the columns along with rows. Here I have sliced the ‘unit’ column to the ‘price’ column for the period 2015–2016.

Here I have sliced all the columns after the ‘price’ column and found the ‘price’ and ‘USD price’ for 2010–2011.

We can use lambda for choosing specific columns. Here I have found the ‘market’ and ‘price’ for that particular time.

We can choose the specific columns without using the lambda function also. I am just showing you the options. You can follow any method. Here I have chosen ‘market,’ ‘commodity,’ and ‘price’ for 1994–2000.

In our day-to-day life, as a data-scientists, we play with datasets and extract valuable insights. To gather valuable insights, we need to analyze the data thoroughly. Here we’ve checked how the prices of each commodity are in different periods using simple slicing techniques.

I hope this article will give you some clarity about slicing. We can do many more analyses in the upcoming article.

Thanks

For more data science insights, connect with me on LinkedIn.

https://www.linkedin.com/in/arunkumar-data-scientist/

--

--