Big image datasets with Flickr, Python and flickrapi

A guide to a great source of data

Adrian Martinez
2 min readSep 22, 2019
by Shenandoah National Park

Getting started:

Before getting to the fun part we must get an API Key from Flickr. It’s free!!!

The code:

Install flickrapi

$ pip install flickrapi

Create a file called “flickr.py”.

We don’t want to download any image that is too small, and also sometimes it’s not available in certain size. To solve this we create a list in order of preference with all the sizes (actually the URL of the image for that size) we would be happy with. If the image is not available with any of this sizes it will be ignored. If it is then the first size in the list will be downloaded.

And very useful, here is a list of all the sizes:

  • url_o: Original (4520 × 3229)
  • url_k: Large 2048 (2048 × 1463)
  • url_h: Large 1600 (1600 × 1143)
  • url_l=: Large 1024 (1024 × 732)
  • url_c: Medium 800 (800 × 572)
  • url_z: Medium 640 (640 × 457)
  • url_m: Medium 500 (500 × 357)
  • url_n: Small 320 (320 × 229)
  • url_s: Small 240 (240 × 171)
  • url_t: Thumbnail (100 × 71)
  • url_q: Square 150 (150 × 150)
  • url_sq: Square 75 (75 × 75)

Now we can do the search using “flickr.walk” which returns an iterable object.

And this function will allow us to get the URL for a photo following our list of sizes.

Putting those two functions together we can get all the images we want with the desired size.

Now we have all the URLs, but we need to download the images. Create another file called downloader.py.

Before downloading the photos we need to make sure the folder where we are going to save them actually exist, otherwise you might get an error.

And finally we can download them.

Putting everything together in main.py we have.

And if we run it.

$ python main.py
Getting urls for blue jay
Downloading images for blue jay
Getting urls for northern cardinal
Downlaing images for northern cardinal
Getting urls for american goldfinch
Downlaing images for american goldfinch
Took 9.79 seconds

It works!!!!!!

Now we have the new folders with the images inside.

|-- data
|-- blue jay
|-- somerandomname.jpg
...
...

Keep in mind that depending on your internet speed, the amount of images and their size it can take a while.

Source code: https://github.com/adrianmrit/flickrdatasets

--

--