Learning & Sharing EP.1: Downloading Email Attachments in Outlook (Selenium-Python)

Tsuru Lee
KBTG Life
Published in
11 min readFeb 15, 2024
Photo by Marek Piwnicki on Unsplash

Prerequisites: familiarity with HTML and CSS, proficiency in Python, and ability to inspect HTML elements

In December of last year, I enthusiastically became part of the internal knowledge-sharing initiative within the Software Quality Assurance (SQA) guild. This decision was driven by my eagerness to diversify my skill set. For those who may be new to my articles, I am a Machine Learning Engineer with a passion to explore and expand my knowledge horizons.

During this knowledge-sharing journey, one topic that particularly piqued my interest was Selenium. Selenium is a remarkable library that empowers users to automate browser actions, making it a valuable tool in the world of software testing and quality assurance. It also provided me with the solution to my problem.

Problem Formulation

Perhaps many of you are acquainted with the scenario where you routinely receive emails with attachments, which, in my case, happens through Outlook.

For most, the typical course of action involves manually downloading and saving each attachment. However, for those with a technical background, there’s a more efficient and sustainable approach available. You can leverage Microsoft’s Graph API to establish a connection with Outlook. Not only does this streamline the process, but also ensures a smoother, long-term solution.

That said, let’s consider a situation where, for crucial reasons, accessing Outlook via API is simply not feasible. In such cases, the only remaining option is to repeatedly download the attachments manually. Before you start feeling disheartened, take heart! This is precisely where Selenium comes to the rescue.

In my case, I receive daily emails from System A on workdays. At the end of everyone month, my task involves downloading all the emails from that month and then using them for further analytical tasks.

Define Steps

When embarking on the journey to automate tasks using Selenium, the initial step involves defining a series of actions to be performed. In the context of automating the process of downloading attachments from Outlook, I’ve identified 12 actions:

  1. Open Chrome
  2. Navigate to outlook.office.com/mail
  3. Input my email address
  4. Input my password
  5. Confirm my stay signed-in option
  6. Access the mailbox
  7. Select the targeted email
  8. Click on the attachment file
  9. Click on the download button
  10. Return to the email list
  11. Move on to the next email
  12. Repeat steps 6 through 11 until all attachments have been successfully downloaded

In certain steps, my intention was to offer only the code since it utilizes the same method as in the previous steps.

Disclaimer !!!

  • This code is intended solely to demonstrate how I’ve integrated Selenium into my workflow. As a result, the code may appear immature, containing multiple repeated statements, suboptimal practices, and a lack of modularity. You are encouraged to refactor it as needed. Feel free to share your refactored code in the comments; your contributions are greatly appreciated.
  • Please be aware that my default language for Office 365 (O365) is set to Thai. I apologize for any inconvenience this may cause non-Thai readers.

Step 1: Open Chrome

To open Chrome via Selenium, we need to use a WebDriver. Before launching it, let’s configure some essential settings for our task:

  1. Set the window size - this step is crucial to ensure that the entire Outlook interface is visible, as a small window might hide certain elements
  2. Define the download directory for our files
  3. Disable the download prompt

Once you’ve executed the snippet below, you’ll witness Chrome opening up. It’s worth noting that you can configure Selenium to run in ‘headless’ mode, where it operates without displaying the actual browser. Ne, if you’re like me, you might find it enjoyable to watch the automation process unfold before your eyes.


from selenium import webdriver
from selenium.webdriver.chrome.options import Options

CHROME_WINDOW_SIZE = "1920,1080"
DOWNLOAD_DIR = r"/path/to/your/folder"

chrome_options = Options()
chrome_options.add_argument(f"--window-size={CHROME_WINDOW_SIZE}")
chrome_options.add_experimental_option("prefs", {
"download.default_directory": DOWNLOAD_DIR,
"download.prompt_for_download": False,
})

driver = webdriver.Chrome(options=chrome_options)

Step 2: Navigate to outlook.office.com/mail

Navigation to a specific website is straightforward; you can achieve it by utilizing the get method of the WebDriver object.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

CHROME_WINDOW_SIZE = "1920,1080"
DOWNLOAD_DIR = r"/path/to/your/folder"

chrome_options = Options()
chrome_options.add_argument(f"--window-size={CHROME_WINDOW_SIZE}")
chrome_options.add_experimental_option("prefs", {
"download.default_directory": DOWNLOAD_DIR,
"download.prompt_for_download": False,
})

driver = webdriver.Chrome(options=chrome_options)

# Navigate to outlook.office.com/mail
driver.get("https://outlook.office.com/mail/")
Your Chrome After Executing the Code

Step 3: Input My Email Address

Before we delve deeper, let’s streamline the process by setting up our email and password in a separate file, preferably in JSON format. To interact with the email box effectively, I’ll start by selecting the element with the name ‘loginfmt’ using the find_element method to specify our information. Finally, I'll execute the action by pressing the Enter key.

The challenge arises when the connection is slow, as your code might execute before the required element appears on the page. In such situations, it becomes essential to implement a waiting mechanism until the element becomes visible. This is where WebDriverWait and expected_conditions come to your aid, providing effective solutions to handle these scenarios.

Please note that sometimes, I have encountered situations where if an action is executed immediately after a preceding action, my code doesn’t function as expected. Although I haven’t identified the root cause yet, I have found a workaround by adding a time.sleep delay between the actions.

So, our current code will look like this:

{
"username":"your@email.com",
"password": "yourpassword"
}
import json
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

CHROME_WINDOW_SIZE = "1920,1080"
DOWNLOAD_DIR = r"/path/to/your/folder"
PAGE_LOAD_WAIT_SEC = 30
DELAY_SEC = 3

with open('authen.json') as json_file:
authen = json.load(json_file)

chrome_options = Options()
chrome_options.add_argument(f"--window-size={CHROME_WINDOW_SIZE}")
chrome_options.add_experimental_option("prefs", {
"download.default_directory": DOWNLOAD_DIR,
"download.prompt_for_download": False,
})

driver = webdriver.Chrome(options=chrome_options)
driver_wait = WebDriverWait(driver, PAGE_LOAD_WAIT_SEC)

# Navigate to outlook.office.com/mail
driver.get("https://outlook.office.com/mail/")

time.sleep(DELAY_SEC)

# Input my email address
login_box = (By.NAME, "loginfmt")
login_box = driver_wait.until(EC.presence_of_element_located(login_box))
login_box.send_keys(authen["username"])
login_box.send_keys(Keys.RETURN)
Your Chrome After Executing the Code

Step 4: Input My Password

Entering the password is straightforward, as we can achieve it with a minor modification to our code used for entering the username. You can refactor the code by write it as a function if you want.

import json
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

CHROME_WINDOW_SIZE = "1920,1080"
DOWNLOAD_DIR = r"/path/to/your/folder"
PAGE_LOAD_WAIT_SEC = 30
DELAY_SEC = 3

with open('authen.json') as json_file:
authen = json.load(json_file)

chrome_options = Options()
chrome_options.add_argument(f"--window-size={CHROME_WINDOW_SIZE}")
chrome_options.add_experimental_option("prefs", {
"download.default_directory": DOWNLOAD_DIR,
"download.prompt_for_download": False,
})

driver = webdriver.Chrome(options=chrome_options)
driver_wait = WebDriverWait(driver, PAGE_LOAD_WAIT_SEC)

# Navigate to outlook.office.com/mail
driver.get("https://outlook.office.com/mail/")

time.sleep(DELAY_SEC)

# Input my email address
login_box = (By.NAME, "loginfmt")
login_box = driver_wait.until(EC.presence_of_element_located(login_box))
login_box.send_keys(authen["username"])
login_box.send_keys(Keys.RETURN)

time.sleep(DELAY_SEC)

# Input my password
password_box = (By.NAME, "passwd")
password_box = driver_wait.until(EC.presence_of_element_located(password_box))
password_box.send_keys(authen["password"])
password_box.send_keys(Keys.ENTER)
Your Chrome After Executing the Code

Step 5: Confirm My Stay Signed-in Option

As depicted in the image provided in step 4, you’ll notice the need to confirm whether to stay signed in or not. In my scenario, I typically choose ‘Yes.’ It’s worth noting that this preference persists as long as the driver is kept open. Therefore, if you close the driver and reopen it, you’ll encounter the same page.

Clicking the ‘Yes’ button is a straightforward task. You simply switch from using the send_keys method, which is employed for entering your username and password, to using the click method instead.

From this point onward, I will exclusively highlight the code changes to maintain conciseness in the article. You can find the complete version of the code at the end of the article.

# Confirm my stay signed-in option
stay_sign_in = (By.ID, "idSIButton9")
stay_sign_in = driver_wait.until(EC.presence_of_element_located(stay_sign_in))
stay_sign_in.click()
Your Chrome After Executing the Code

Step 6: Access the Mailbox

I’ve organized the emails from System A into a dedicated mailbox named ‘System A’ using email rules. If you haven’t tried this approach yet, I encourage you to do so.

Now, in this particular step, the mailbox panel must be made clickable. To achieve this, we can utilize element_to_be_clickable from EC (Expected Conditions).

Next, I proceed to select the element using its class name. It’s worth noting that in Selenium, as well as many other web automation tools and libraries, you must replace any spaces in a class name with a dot (period) when locating an element by class name. This practice aligns with the conventions of CSS (Cascading Style Sheets) selectors. The reason behind this approach lies in the fact that, in CSS, class names with spaces are interpreted as multiple classes applied to an element, rather than as a single class.

Finally, we come to the step of selecting the mailbox. The mailbox element consists of the name followed by the number of unread emails. This can be efficiently accomplished using the CSS_SELECTOR method by specifying a condition that matches mailboxes title starting with 'System A' using the 'starts with' operator ^=."


# ...

CHROME_WINDOW_SIZE = "1920,1080"
DOWNLOAD_DIR = r"/path/to/your/folder"
ELEMENT_LOAD_WAIT_SEC = 30
DELAY_SEC = 5
MAIL_BOX = "System A"

# ....

# Access the mailbox
def preprocess_element_class(x: str ):
return x.replace(" ",".")

mail_box_panel = (By.CLASS_NAME, preprocess_element_class("C2IG3 if6B2 oTkSL iDEcr OPUpK"))
mail_box_panel = driver_wait.until(EC.element_to_be_clickable(mail_box_panel))
mail_box_panel.click()

mail_box = (By.CSS_SELECTOR, f"div[title^='{MAIL_BOX}']")
mail_box = driver_wait.until(EC.presence_of_element_located(mail_box))
mail_box.click()
Your Chrome After Executing the Code

Step 7: Select the Targeted Email

# Select the targeted email
mail = (By.CLASS_NAME, preprocess_element_class("hcptT gDC9O"))
mail = driver_wait.until(EC.presence_of_element_located(mail))
mail.click()

Step 7: Click on the Attachment File

# Click on the attachment file
attachment = (By.CLASS_NAME, "Y0d3P")
attachment = driver_wait.until(EC.presence_of_element_located(attachment))
attachment.click()
Your Chrome After Executing the Code

Step 8: Click on the Download Button

# Click on the download button
download = (By.CSS_SELECTOR, 'button[name="ดาวน์โหลด"]')
download = driver_wait.until(EC.presence_of_element_located(download))
download.click()

Step 9: Return to the Email List

To navigate back to the email list, our first task is to exit the download pop-up. This can be accomplished by sending a key press event to the WebDriver using ActionChains. Subsequently, we can employ the invisibility_of_element_located condition to verify if the element has indeed disappeared.

# Return to the email list
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
download_popup = (By.CLASS_NAME, preprocess_element_class("m79Ne NjYhI"))
driver_wait.until(EC.invisibility_of_element_located(download_popup))
Your Chrome After Executing the Code

Step 10: Move on to the Next Email

This step presents a bit of a challenge since I didn’t initially create a list of emails and iterate through them. This is due to Outlook dynamically loading elements as you scroll down the mailbox. Creating an array when initially visiting the mailbox won’t capture all the emails you need.

The solution is to select the previous email, stored in the variable mail, and then simulate pressing the 'down' key from that point. Afterward, you can select the current element using switch_to.active_element.

# Move on to the next email
mail.click()
webdriver.ActionChains(driver).send_keys(Keys.DOWN).perform()
mail = driver.switch_to.active_element
Your Chrome After Executing the Code

Step 11: Repeat Steps 6 Through 11 Until All Attachments Have Been Successfully Downloaded

In my particular scenario, I aim to download emails from the latest ones up to a specific date. To achieve this, I employ a while loop that iterates through all the emails, and it terminates when the received date of an email matches the STOP_DATE. You can easily identify the date received, located at the rightmost section of the email header, in the format YYYY/MM/DD."

# ...

CHROME_WINDOW_SIZE = "1920,1080"
DOWNLOAD_DIR = r"/path/to/your/folder"
ELEMENT_LOAD_WAIT_SEC = 30
DELAY_SEC = 5
MAIL_BOX = "System A"
FULL_DATE_PATTERN = r'(\d{4})-(\d{1,2})-(\d{1,2})'
STOP_DATE = "2023-12-16"

# ...

# Select the targeted email.
mail = (By.CLASS_NAME, preprocess_element_class("hcptT gDC9O"))
mail = driver_wait.until(EC.presence_of_element_located(mail))

while True:
mail.click()

# Check stop condition (email date)
email_date = (By.CLASS_NAME, preprocess_element_class("AL_OM sxdRi I1wdR"))
email_date = driver_wait.until(EC.presence_of_element_located(email_date))
email_date = re.search(FULL_DATE_PATTERN,email_date.text,re.MULTILINE).group(0)
if email_date == STOP_DATE:
break

# Click on the attachment file
attachment = (By.CLASS_NAME, "Y0d3P")
attachment = driver_wait.until(EC.presence_of_element_located(attachment))
attachment.click()

# Click on the download button
download = (By.CSS_SELECTOR, 'button[name="ดาวน์โหลด"]')
download = driver_wait.until(EC.presence_of_element_located(download))
download.click()

# Return to the email list
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
download_popup = (By.CLASS_NAME, preprocess_element_class("m79Ne NjYhI"))
driver_wait.until(EC.invisibility_of_element_located(download_popup))

# Move on to the next email
mail.click()
webdriver.ActionChains(driver).send_keys(Keys.DOWN).perform()
mail = driver.switch_to.active_element
Your Chrome After Executing the Code

The Final Code

import json
import re
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait


CHROME_WINDOW_SIZE = "1920,1080"
DOWNLOAD_DIR = r"/path/to/your/folder"
ELEMENT_LOAD_WAIT_SEC = 30
DELAY_SEC = 3
MAIL_BOX = "Masterdata"
FULL_DATE_PATTERN = r'(\d{4})-(\d{1,2})-(\d{1,2})'
STOP_DATE = "2023-12-16"

# Access the mailbox
def preprocess_element_class(x: str ):
return x.replace(" ",".")

if __name__ == "__main__":

with open('authen.json') as json_file:
authen = json.load(json_file)

chrome_options = Options()
chrome_options.add_argument(f"--window-size={CHROME_WINDOW_SIZE}")
chrome_options.add_experimental_option("prefs", {
"download.default_directory": DOWNLOAD_DIR,
"download.prompt_for_download": False,
})


driver = webdriver.Chrome(options=chrome_options)
driver_wait = WebDriverWait(driver, ELEMENT_LOAD_WAIT_SEC)

# Navigate to outlook.office.com/mail
driver.get("https://outlook.office.com/mail/")

# Input my email address
login_box = (By.NAME, "loginfmt")
login_box = driver_wait.until(EC.presence_of_element_located(login_box))
login_box.send_keys(authen["username"])
login_box.send_keys(Keys.RETURN)

time.sleep(DELAY_SEC)

# Input my password
password_box = (By.NAME, "passwd")
password_box = driver_wait.until(EC.presence_of_element_located(password_box))
password_box.send_keys(authen["password"])
password_box.send_keys(Keys.ENTER)

time.sleep(DELAY_SEC)

# Confirm my stay signed-in option
stay_sign_in = (By.ID, "idSIButton9")
stay_sign_in = driver_wait.until(EC.presence_of_element_located(stay_sign_in))
stay_sign_in.click()

time.sleep(DELAY_SEC)

# Access the mailbox
mail_box_panel = (By.CLASS_NAME, preprocess_element_class("C2IG3 if6B2 oTkSL iDEcr OPUpK"))
mail_box_panel = driver_wait.until(EC.element_to_be_clickable(mail_box_panel))
mail_box_panel.click()

time.sleep(DELAY_SEC)

mail_box = (By.CSS_SELECTOR, f"div[title^='{MAIL_BOX}']")
mail_box = driver_wait.until(EC.presence_of_element_located(mail_box))
mail_box.click()

time.sleep(DELAY_SEC)

# Select the targeted email.
mail = (By.CLASS_NAME, preprocess_element_class("hcptT gDC9O"))
mail = driver_wait.until(EC.presence_of_element_located(mail))

while True:

time.sleep(DELAY_SEC)

# Check stop condition (email date)
mail.click()
email_date = (By.CLASS_NAME, preprocess_element_class("AL_OM sxdRi I1wdR"))
email_date = driver_wait.until(EC.presence_of_element_located(email_date))
email_date = re.search(FULL_DATE_PATTERN,email_date.text,re.MULTILINE).group(0)
if email_date == STOP_DATE:
break

time.sleep(DELAY_SEC)

# Click on the attachment file
attachment = (By.CLASS_NAME, "Y0d3P")
attachment = driver_wait.until(EC.presence_of_element_located(attachment))
attachment.click()

time.sleep(DELAY_SEC)

# Click on the download button
download = (By.CSS_SELECTOR, 'button[name="ดาวน์โหลด"]')
download = driver_wait.until(EC.presence_of_element_located(download))
download.click()

time.sleep(DELAY_SEC)

# Return to the email list
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
download_popup = (By.CLASS_NAME, preprocess_element_class("m79Ne NjYhI"))
driver_wait.until(EC.invisibility_of_element_located(download_popup))

time.sleep(DELAY_SEC)

# Move on to the next email
mail.click()
webdriver.ActionChains(driver).send_keys(Keys.DOWN).perform()
mail = driver.switch_to.active_element

Conclusion

  • As demonstrated, utilizing automated tools for web interactions can present various restrictions and often requires significant effort to maintain code compatibility with evolving web interfaces. Therefore, using Selenium to download email attachments is recommended primarily when an official API is not available.
  • Selenium offers a versatile set of capabilities, enabling interactions with web browsers via WebDriver, handling wait times using methods like time.sleep or WebDriverWait, and executing actions through send_keys or click.
  • When working with Selenium, we can select HTML elements using a variety of By strategies, including CLASS_NAME, ID, CSS_SELECTOR, and more. You can find a comprehensive list of these strategies in the Selenium documentation.

Before we part ways, I’d like to express my sincere gratitude to each and every one of you for taking the time to delve into this article. Your interest and engagement mean a lot to me.

Stay tuned for more exciting content in the near future. I’m continually working on new articles and insights to share with you. Your support is what keeps me motivated, and I can’t wait to have you back for the next installment. Until then, happy reading and exploring!

For those who enjoy this article, don’t forget to follow Medium: KBTG Life. We have tons of great pieces written by KBTG people in both English and Thai.

--

--

Tsuru Lee
KBTG Life

A competitive person who wanted to compete with himself yesterday :)