Attaching Multiple Looker Reports into a Single Email Using AWS Lambda

Alex Belsky
7 min readJun 18, 2024

--

Photo by rivage on Unsplash

Intro

I recently developed a dashboard for my company’s C-level management, known as the Executive Report dashboard. This dashboard consolidates all the company’s major metrics and KPIs.

Upon completion, the lead manager requested that this report be sent in a daily scheduled email to start his day, which was a significant compliment to my work.

The company operates in three major business verticals, referred to as asset classes. The manager wanted separate reports for each asset class, so I utilized Looker’s scheduler to set up three distinct reports.

Looker Schedule

Every day a group of C-level executives began receiving three daily emails, each containing a report for a different asset class.

Daily Emails

The next request was to combine all asset class reports into a single email. This posed a technical challenge because Looker does not support multiple attachment schedules, neither through its scheduler nor via the Looker API.

Looker Scheduler (left) & Looker API Explorer

Solution

To address this, I created an automated process using Python code triggered daily by AWS Lambda. This post will focus on this solution. You can check the project workflow below:

Workflow

Here you can see packages and functions I used:

The packages

from pathlib import Path
import smtplib
import re
import boto3
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication
from email.mime.text import MIMEText
from datetime import datetime, date

A brief explanation of what each package is used for in the given code:

a. pathlib:

  • Provides a way to work with file system paths using object-oriented classes.
  • Used here to manage file paths and handle file operations.

b. re:

  • Provides support for regular expressions.
  • Used here to define and match patterns in filenames.

c. smtplib:

  • Defines an SMTP client session object that can be used to send mail to any internet machine with an SMTP or ESMTP listener daemon.
  • Used here to send emails via SMTP.

d. boto3:

  • The Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2.
  • Used here to interact with AWS S3 for downloading, copying, and deleting files.

e. email.mime.multipart.MIMEMultipart:

  • Used to create a MIME (Multipurpose Internet Mail Extensions) multipart email message.
  • Used here to construct the email message container.

f. email.mime.application.MIMEApplication:

  • Represents a MIME application content type.
  • Used here to attach PDF files to the email message.

g. email.mime.text.MIMEText:

  • Represents MIME text content type.
  • Used here to attach the text body to the email message.

h. datetime:

  • Supplies classes for manipulating dates and times.
  • Used here to get the current date and time for validating filenames and creating date strings.

The functions

download_files_from_s3 function downloads all PDF files from a specified S3 bucket and folder that were modified today and returns a list of the downloaded file names.

def download_files_from_s3(bucket_name, parent_folder):
s3 = boto3.client('s3')
objects = s3.list_objects_v2(Bucket=bucket_name, Prefix=parent_folder)
files = []

if 'Contents' in objects:
for obj in objects['Contents']:
file_name = obj['Key']
last_modified = obj['LastModified']
if file_name.endswith('.pdf') and last_modified.date() == datetime.now().date():
local_file_name = Path(file_name).name
s3.download_file(bucket_name, file_name, local_file_name)
files.append(local_file_name)
return files

validate_file_names function validates a list of filenames to ensure they match a specific naming pattern, were generated today, and include exactly one file for each asset class (PL, Auto, POS). It returns a list of valid filenames if the criteria are met, otherwise it returns an empty list.

def validate_file_names(files):
today_str = datetime.now().strftime("%Y-%m-%d")
# Looker adds a timestamp and a 4-digit hash number to the PDF name
# I check the filenames are consistent with the known pattern
pattern = re.compile(r'Executive_Report_-_(PL|Auto|POS)_' + today_str + r'T\d{4}_.{6}\.pdf')

# Each asset class should appear only once
valid_files = []
asset_class_count = {'PL': 0, 'Auto': 0, 'POS': 0}

for file in files:
match = pattern.match(file)
if match:
asset_class = match.group(1)
asset_class_count[asset_class] += 1
valid_files.append(file)

# Check that each asset class has exactly one file
if all(count == 1 for count in asset_class_count.values()):
return valid_files
else:
return []

send_email function sends an email with attached PDF files to a list of recipients. The email contains a predefined message body, attaches the specified PDF files, and sends the email through an SMTP server.

def send_email(files, recipients):
# Create message container and email body
message = """
Hello,

Attached are the daily Executive Reports
Those reports, powered by Looker, offer an overview of the company's main metrics and KPIs.
The Looker dashboard can be found at this link - https://mycompany.looker.com/dashboards/XXXX
Don't hesitate to contact us via the #bi

Your feedback is valued as we strive for excellence.

Best regards,
BI Team.
"""

msg = MIMEMultipart()
msg['From'] = SENT_FROM
msg['To'] = ", ".join(recipients)
msg['Subject'] = f"Executive Reports - Automated PDF Sender"
msg.attach(MIMEText(message, 'plain'))

today = datetime.now().strftime("%Y-%m-%d")

# Attach PDF files
for file in files:
file_path = Path(file)
with file_path.open('rb') as f:
part = MIMEApplication(f.read(), Name=file_path.name)
part['Content-Disposition'] = f'attachment; filename="{file_path.name}"'
msg.attach(part)

# Send mail
with smtplib.SMTP('smtp.gmail.com', 587) as server:
server.starttls()
server.login(GMAIL_USER, GMAIL_APP_PASSWORD)
server.sendmail(SENT_FROM, SEND_TO, msg.as_string())

send_error_mail_to_admin function sends an error email to the administrator (BI Team) with a specified message body. The email notifies about a failure in sending the daily executive reports and sends the email through an SMTP server.

def send_error_mail_to_admin(text, send_error_mail_to_admin):
# Create message container and email body
message = text

msg = MIMEMultipart()
msg['From'] = SENT_FROM
msg['To'] = ", ".join(SEND_ERROR_TO_ADMIN)
msg['Subject'] = f"Executive Reports - Daily Mail Wasn't Sent"
msg.attach(MIMEText(message, 'plain'))

# Send mail
with smtplib.SMTP('smtp.gmail.com', 587) as server:
server.starttls()
server.login(GMAIL_USER, GMAIL_APP_PASSWORD)
server.sendmail(SENT_FROM, SEND_ERROR_TO_ADMIN, msg.as_string())

move_files_to_archive function moves send files to the Archive folder.

def move_files_to_archive(bucket_name, files, parent_folder, archive_folder):
s3 = boto3.client('s3')
for file in files:
copy_source = {'Bucket': bucket_name, 'Key': f"{parent_folder}{file}"}
s3.copy(copy_source, bucket_name, f"{archive_folder}{file}")
s3.delete_object(Bucket=bucket_name, Key=f"{parent_folder}{file}")

And here you can see the business logic of the process:

# Main code
pdf_files = download_files_from_s3(BUCKET_NAME, PARENT_FOLDER)
valid_pdf_files = validate_file_names(pdf_files)

try:
if valid_pdf_files:
if len(valid_pdf_files) != 3:
print('Mail Wasnt Sent! Number of files is different than 3, please check the s3 Pending bucket')
send_error_mail_to_admin(message_wrong_file_number, SEND_ERROR_TO_ADMIN)
else:
send_email(valid_pdf_files, SEND_TO)
move_files_to_archive(BUCKET_NAME, valid_pdf_files, PARENT_FOLDER, ARCHIVE_FOLDER)
print('Mail was sent!')
else:
send_error_mail_to_admin(message_missing_files, SEND_ERROR_TO_ADMIN)
print('Mail was not sent! The files are missing')

except Exception as exception:
error_def = str("Error: %s!\n\n" % exception)
send_error_mail_to_admin(message_error, SEND_ERROR_TO_ADMIN)
print("Error: %s!\n\n" % exception)

Instead of using your private email address to send the emails, a more suitable approach might be to create a generic company email account and securely store its password in AWS Secrets Manager:

# Get the secret key from AWS
def fetch_secret_from_aws(secret_name):
try:
session = boto3.session.Session()
client = session.client(service_name='secretsmanager', region_name='us-east-1')
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
return get_secret_value_response['SecretString']
except Exception as e:
print(e)
return None

Also, you may consider storing a list of emails in your s3 bucket instead of updating the recipients list inside the code:

# Get Recipients mail list
def get_recipients_emails_from_s3(bucket_name, file_key):
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket_name, Key=file_key)
emails = obj['Body'].read().decode('utf-8').splitlines()
filtered_emails = [email for email in emails if email.endswith('@mycompany.com')]

return filtered_emails

In the next section, I will discuss some challenges I encountered during development and highlight potential issues you might face if you undertake a similar project in your company.

Things to Consider

When implementing this workflow, here are a few key points to keep in mind:

a. File Validation: Ensure your files meet predefined criteria such as file extension, date updated, and the number of files before sending them.

b. Dedicated Email Account: Use a generic email account created specifically for this purpose instead of your personal one.

Pros: Allows for a fully automated process that isn’t tied to a specific engineer.

Cons: Requires logistical setup, including creating the account, connecting it to the AWS account, and integrating with Lambda.

c. Error Handling: Develop a clear flow for potential issues, such as Looker failing to send files, sending too many files, or incorrect file names. Set up straightforward alerts for each case to notify the process owner.

d. Amazon SES: Consider using Amazon Simple Email Service (SES) for increased functionality in sending emails.

e. Case Sensitivity: Be mindful that bucket and folder paths are case-sensitive.

f. File Download: The download_file method from the os package, which is commonly used for file path manipulations or running system commands, didn't work for me in Lambda. Consider using the pathlib package instead.

Project Implementation

Below is a snapshot from my email account showing a multiple attachment email generated by my workflow, demonstrating the successful implementation of the project:

Multiple attachment daily email

Editor’s cuts

  1. I’ve used the dbt scheduled job to trigger Lambda because of a permissions issue. Having relevant permissions one can easily create an Eventbridge scheduler to trigger the Lambda (in my company only DevOps has the permissions to use the Eventbridge and I wanted the process to be as independent as possible).
  2. If you have any questions about the code, the functionality or any other field, I’d be glad to connect and specify in more details.

--

--