How to download 100 pictures from a site with selenium?
I read a book, “The Science of Getting Rich”, by Wallace D. Wattles and fell in love with a lot of the maxims in the book that I thought to be pure gold. In order to keep reminding myself of this maxims and not lose sight of the mindset I was trying to have. I decided to use his quotes as my wallpaper. I searched around the internet for a images that had his quotes. I found this amazing site, that compiles quotes of some of the world’s great thinkers and imposes them on neat backgrounds. Beautiful, right?
The problem was they had 100 of these wallpapers on a single page.
There was no simple way to click a download button and pull down every thing. One thing was sure, I could not spend the next couple of minutes clicking download on 100 images and 5 “Show More” buttons. I decided to write some python (my fav) to do the work for me.
If you want to follow along, you should have python on your computer. You should also have selenium (a third party package) installed via pip. Selenium is the tool that does the clicking that you don’t want to do so you can watch your favourite show on Youtube.
I won’t advise that you run this manually. At the end of this list of steps, there is a small script that orchestrates how it all goes down.
Create a script to initialize the web selenium’s web driver.
webdriver.Firefox() on line 6 indicates what web browser you would like the web driver to use.
Get the download links to all the images on the page
This function takes in two arguments. The driver argument should be the one we initialized from the previous step. The URL parameter should be a string that points to the page where the images are located.
Here we wait until a button with class, “loadmore” becomes visible and click on it. This action is repeated until there’s nothing else to load. Then, we use regex to find all links on the page that meet the pattern supplied as argument. I arrived at this pattern by inspecting the href attributes of the download links on the page and settling for what was common among them, yet unique so I do not pick up other links on the page.
Download the images
This function accepts three arguments. “prefix” is a string representation of the base URL, in our case, that is “https://quotefancy.com”. “dirname” is the name of the directory where we want to save the image. “links” is a list of all the image download links we got in the previous step.
The function downloads its associated image for each link. In the requests.get() call on line 9, we set stream to True because we are downloading a potentially large file. Then, we save each image to the local file system and delete the image from memory.
Save Image to File
This function also accepts three parameters. “image” is clear. “dirname” is the name of the directory where we want to save the image. “suffix” is a string we want to append to the file name. Here, we simply use the index of the current image.
The function creates a new file with a “.jpg” extension and copies the image from memory to the new file created.
Tie it all together
This is the script that calls each of the functions we have created in the order in which we want. I have also added a script for creating the image directory.
Of course, you should have all these snippets together in one file arranged in one file.
And that is all it takes to lazily download lots of images from a site. Thanks for reading.