How does DataLoader work in PyTorch?

Calvin Ku
Calvin Ku
Sep 9, 2018 · 2 min read

Why use DataLoader?

Because you don’t want to implement your own mini batch code each time. And since you’re gonna write up some wrapper for it anyway, the guys at FAIR thought they’d just do it for you to save you the trouble. Also it’s standardized so anyone can figure out how you prepare your data easily when they see your code. And I think this wrapper they’ve come up with is pretty good.

How it works

Basically the DataLoader works with the Dataset object. So to use the DataLoader you need to get your data into this Dataset wrapper. To do this you only need to implement two magic methods: __getitem__ and __len__. The __getitem__ takes an index and returns a tuple of (x, y) pair. The __len__ is just your usual __len__ that returns the size of the data. And that’s that.

Super easy sample code

Here’s a snippet of some super easy sample code, just for you to get an rough idea of how this works.

from scipy import misc
import torch
from torch.utils.data import Dataset, DataLoader
class SomeImageDataset(Dataset):
"""The training table dataset.
"""
def __init__(self, x_path):
x_filenames = glob(x_path + '*.png') # Get the filenames of all training images

self.x_data = [torch.from_numpy(misc.imread(filename)) for filename in x_filenames] # Load the images into torch tensors
self.y_data = target_label_list # Class labels
self.len = len(self.x_data) # Size of data

def __getitem__(self, index):
return self.x_data[index], self.y_data[index]

def __len__(self):
return self.len

To load this data, just use the DataLoader:

dataset = SomeImageDataset(x_path)train_loader = DataLoader(dataset=dataset,
batch_size=32,
shuffle=True,
num_workers=2
)
for epoch in range(num_epochs): for i, data in enumerate(train_loader):
x_imgs, labels = data

# Do whatever you want with the data...

I have to confess that for a long time I didn’t want to learn how to use DataLoader because I was super lazy, and every time I needed to use mini batch and I just copied and pasted it from my old code.

So honestly, I’m just writing this for my old self and those who are as lazy as him.

Peace!

Noumena

representation learning / unsupervised learning / weakly-supervised learning

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade