AIN 311 MACHINE LEARNING BLOG 3 — DATA PREPROCESSING

Published in

AIN311 Fall 2023 Projects

2 min readDec 15, 2023

This week, we completed our first stage in the project. We prepared our dataset to be suitable for the models we will use.

Data Merging Process

By merging two different datasets, we created a comprehensive dataset consisting of “glass,” “metal,” “paper,” and “plastic” classes. We used 1000 data sample images for each class, ensuring a balanced dataset.

Data Formatting and Data Frame Creation

We organized the names of the images to create a user-friendly dataset. We traversed unique values representing each class to create a dictionary. Then, by creating a Data Frame containing file paths and image classes, we formatted the data into a more organized structure. You can see the first 10 entries below:

Data Frame Accuracy Check

To verify the accuracy of our Data Frame, we randomly selected some data samples and printed it on the screen. In this step, we checked whether the image name matches the class label.

Size Standardization

After examining a few different images, we noticed variations in the dimensions of images in the dataset. To address this issue and avoid impacting the performance of our model, we standardized all images to the same size.

Class Distribution Visualization

Finally, we used a bar plot to visualize the distribution of images for each class in the dataset.

Throughout this week, we executed these steps, advancing our project further. We continue to make progress in our project. Next week, we’ll provide more information about what we’ve accomplished in the project.