Week 2-Project LEAFS

Abdullah Enes Ergün
AIN311 Fall 2022 Projects
2 min readNov 19, 2022

Hi everyone,
Last week, we introduced Project LEAFS to you briefly and this week we are going to tell you about our data collection process.

Image Scraping from Web

Unfortunately, there were not any large datasets for our project. Therefore, we decided to collect our own data. We developed a Python program to solve this problem that scrapes images from Google Images with specific keywords.

You can reach the ImageMiner Code from the GitHub link here.

Parameters of Image Scraper Code

To use ImageMiner Code you should set your own parameters and run the code.

Examples from Collected Images

There are some of the images that we collected.

Fig 1. Students that raise their hands and sleep
Fig 2. A student who takes notes

When we were collecting the data, we realized some of the images can have multiple classes [Fig 1] some of them not [Fig 2]. This situation gave us a new perspective on multi-class images, before labeling images. We can use an image for different classes.

Image scraping has upsides like automatization but it has downsides such as cartoon images, duplicates of images, unrelated images, images that only include text, and such. On the other hand, there are images that have watermarks from image sites like Shutterstock, Alemy, etc.

Next Week

These downsides indicated that we have to clean the data (of course :) ). So, we are going to delete duplicates and remove the watermarks from the images. As a result of this process, we will finish the data collection and preprocessing steps. Then, we will be able to construct our models.

Summary of next week

Can Ali Ateş

Bahakirbasoglu

Abdullah Enes Ergün

--

--