Dataloaders in AI

Pinkesh Patel, MBA
unpack
Published in
3 min readMay 30, 2021

The Dataloaders task particularly important task in AI system and is typically one amongst the most important roadblocks for the new AI professionals. In this paper, we are going to discuss the dataloder shortly together with the example of Dataloders for the AI model.

To train a neural network in the AI model, we do require a suitable data. Once we have collected the suitable dataset, we need to assemble data in a format that is appropriate for model training which means creating an object called ‘Dataloaders’. The data format requires code that can read training data into memory, convert the data to PyTorch tensors, and serve the data up in batches. In the past, professionals had to write completely custom code for data loading. Now a days as most of the AI system including Pytorch system or fast AI use the Dataset and DataLoader interfaces to serve up training or test data. In general,Dataset object loads training or test data into memory, and a DataLoader object fetches data from a Dataset and serves the data up in batches.

DataLoaders is a thin class that just stores whatever DataLoader objects you pass to it and makes them available as train and valid. Although it is an amazingly simple class, it is important in the AI system as it provides the data for your model.

The key functionality in DataLoaders is provided with just these four lines of code:

Basic code for Dataloaders

Turn Data to Data loaders

Data loaders can be done through range of factory methods and Data block API. We need to do the four things to turn Data into Data loaders objects which includes the Data types, process of getting the list of items, process of labelling the items and process of producing the validation set.

Data loaders for Bear Images

The Data loaders for Bear Images has shown in the following figure and its further explained the code in the details.

Data loaders for Bear Images

Our independent variables are images, and our dependent variables are the categories or the type of bear for each image.
File Path: We must tell AI system or fastai how to get a list of those files.
Defined Validation Set: we would like to have the same training/validation split each time we run this notebook, so we fix the random seed (if you provide the same starting point for that list each time — called the seed — then you will get the exact same list each time).

References

1. Howard, J. and Gugger, S., 2020. Deep Learning for Coders with Fastai and PyTorch: AI Applications Without a PhD. 1st ed. Canada: O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

2. Visual Studio Magazine. 2021. How to Create and Use a PyTorch DataLoader — Visual Studio Magazine. [ONLINE] Available at: https://visualstudiomagazine.com/articles/2020/09/10/pytorch-dataloader.aspx. [Accessed 29 May 2021].

About Author: Pinkesh Patel

Pinkesh have Over 16 years of experience in R&D, portfolio management, and business development in life science & retail Industry. He is mentor and investor at gold and diamond Jewelry firm ‘Proyasha Diamonds’. Pinkesh has Received B.A. Honors in Pharmacology from London Metropolitan university and MBA from Anglia Ruskin University.

--

--

Pinkesh Patel, MBA
unpack
Writer for

The Diversified Pharma Manager🧬💊👨🏻‍💻 | Business Development , Licensing & Strategic Alliance Management https://www.linkedin.com/in/pinkesh-patel-bd/