How do I use Dataset and DataLoader

Dipanwita Mallick
IWriteAsILearn
Published in
2 min readMay 21, 2021

Ever wondered why we need Dataset and DataLoader in PyTorch

We use Dataset and DataLoader to use a very important hyperparameter called “batch size”.

What is batch size?

Batch size refers to the number of the data points considered to calculate the loss value or update the weights. This becomes very handy in the case of huge datasets.

How is it done?

Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples. (refer: link to the tutorial). So we need both.

Let’s take a small use case where we will train our neural network on how to perform subtraction.

#Let’s start by creating dummy data

Now the input data and the labels are ready. The next step will be to create a dataset class and a dataloader.

#Dataset and DataLoader

takes the input data and labels and have methods that return the length and the item at the given index
since the batch size is 2, every “next” operation will return two data points every time

#Define the network and steps needed to train

#Finally let’s train and evaluate the model

Notice how the loss is calculated and weights are updated after every batch of 2(or 2 data points).

#Plot and prediction

As you can see our model can predict correctly the subtraction value as 4(approximately) with input as (8,4).

****The End****

I hope the purpose and the way of using Dataset and DataLoader are a little clear now. :)

Let me know if you tried it.

--

--

Dipanwita Mallick
IWriteAsILearn

I am working as a Senior Data Scientist at Hewlett Packard Enterprise. I love exploring new ideas and new places !! :)