Collecting Image Data For Machine Learning in Python

William Firth
CodeX
Published in
5 min readOct 3, 2021

--

How to use Flickr to generate properly structured data sets

Photo by Boston Public Library on Unsplash

When I set out to build an AI bird feeder for my feathered friend-loving father-in-law, one of the biggest issues I ran into wasn’t training or creating the image recognition model, as I naively expected. Rather, it was gathering enough high quality data that was formatted in a functional way.

My first attempt at gathering bird images involved the use of a pre-existing data set from Kaggle. This is actually a great way to go if the specific data set you’re looking for is available, as I thought it would be with my focus on birds found in my Texas backyard. However, it wasn’t the right route for me, as the data set I downloaded had way more bird species than I wanted, and many of the specific species I wanted were missing.

Onto attempt two: scraping Google Images. This method is probably possible, but I spent too long trying without much success. If you have better luck or know something I don’t, let me know! I’d love to hear how you did it this way.

The strategy that ultimately worked for me involved the use of Flickr’s API. Whatever project you’re working on that requires the use of a custom image data set, spare yourself the growing pains I went through by following the simple steps below.

--

--

William Firth
CodeX
Writer for

I’m a mechanical engineer by education, a data scientist by profession, and a woodworker by weekend