How to “automate downloading files” using Python, Selenium, and Headless Chrome

Ralf lumby
2 min readJun 3, 2019

Problem: I needed daily backup from a website that had no API for me to access this file. I was thinking I could use a simple get request in a Python script, but that wouldn’t work because of authentication and dynamically created content.

Solution: I did some research and solve the problem using Python, virtualenv, Selenium, and Headless Chrome.

Step 1: Check if Python and pip is installed

Where to get Python 3.4+: https://www.python.org/downloads/

You want to make sure we have the below installed:
1. Python 3.4+ installed
2. Pip installed(should come packaged with Python)
3. Git Bash latest version(ONLY INSTALL IF YOU’RE ON WINDOWS)
4. Don’t forget to set your environment variables

Steps 4a and 4b are references if your environmental variables aren’t set:
4a. Windows 10 tutorial on how to set environment variables: https://www.youtube.com/watch?v=Y2q_b4ugPWk
4b. Mac/Linux tutorial on how to set environmental variables: https://www.youtube.com/watch?v=PUIE7CPANfo

How to check if you have Python and Pip installed:
Open your terminal or git bash(windows) and input these commands:

Step 2: Download chrome driver

Download the driver that matches the version of chrome that your running and correct operating system: https://sites.google.com/a/chromium.org/chromedriver/downloads

If you do not know your chrome version, copy and paste this to your url search and you will get your version:
chrome://settings/help

Step 3: Folder/dir setup

Open your terminal(mac/linux) or git bash(windows)

Make a folder to store all of the contents we will be working with and navigate to it in the terminal:

Step 4: Installing virtualenv and creating our virtual environment

Make sure your current directory is the one we created earlier named “headless_test”

Lets install, create, and activate our virtual environment with the commands below:

Step 5: Installing selenium

Input the command below in the terminal:

Step 6: Coding begins

You can copy and paste the code below and the codewill:
1. Setup headless chrome and permissions ready to download files
2. Navigate to https://www.thinkbroadband.com/download using Selenium Webdriver
3. Click on a download icon to download a small test file using Selenium Web element locator/click function

IMPORTANT TO READ THE COMMENTS because there are some paths you need to change.

Save the file as “automate_file_download.py in the directory created earlier “headless_test” and run the script file using the command below:

Step 7: Validate your file has been downloaded

Verify your file been downloaded in the path you set it to and congratulations you’ve automated file downloading!

This can be expanded to downloading multiple files or even running automatic daily tasks with a Jenkins pipeline.

You could also rename your files automatically after downloading them.

Cheers and thank you for reading!

Contact me for any questions as I will answer as soon as I am free!
Github: https://github.com/sudoxx2
My Profile/ Contact Info: https://pmoung.com

--

--