Sending Spark DataFrame via mail

Nikhil Suthar
Jan 14 · 4 min read

Gathering all required data and transform that data into our desired report is major use cases of Apache Spark. Reports type and their information can vary as per business requirements. But if we talk about Apache Spark, all these report irrespective of their information or data are store as DataFrame.
In many cases, we require to send final reports to the Business holders or subject matter experts (SME). In that particular scenario, we can leverage
Scala Spark Email utility features and send a final report/dataframe to a mail using Spark code.

This is the second blog where we will discuss one more use case of Scala Spark Email Utility. Click here to visit the first Blog.

An email with Spark DataFrame

Scala Spark Email Utility with DataFrame

We can send any DataFrame directly to mail using this Utility. Below are examples where we will send DataFrame as a Report directly to a mail.

Note: Visit the first Blog to check how to download and import Scala Spark Email utility.

Example 1: Sending DataFrame as a Report

Let’s take one Spark Example where we will load one csv file into DataFrame and will do some calculation or transformation. In the end, we will send the final transformed DataFrame as a report into mail directly.

Sending DataFrame without attachment into a mail

Above code read company.csv file and calculate the average Salary of each company in AvgSalaryDF. We have written a method createHtmlEmailBody that converts dataFrame into an HTML format and returns as a String message. Next, we have passed that HTML message as a parameter to sendMail method of the utility that further decodes HTML format and sends mail to all recipients. That dataframe will look as below in the mail.

Output:

Example 1 Output

Example 2: Sending DataFrame as an Attachment

Spark Email Utility provides multiple parameters to send Emails. You can get more details about the parameters here. In this example, we will use one more parameter and will send Final DataFrame as an attachment to the mail.

Sending DataFrame as an Attachment

In the above code, we have written the final dataFrame AvgSalaryDF to the Output path and added one new method (getPathOfCSVFiles)that gives a list of the complete path of all csv files written to Output Path. That List we have converted into String of all files path delimited by a semicolon (;). Since Spark Email Utility takes all path as a String datatype and delimited by a semicolon (;) as mentioned in the GitHub Link. Finally, we have passed a list of files along with the HTML format message of DataFrame AvgSalaryDF. We will get mail as below Output.

Output:

Mail with the same Name of Attachment

You can see the above output, where the attachment file name is part-0000…. that is Apache Spark naming convention of any written files. But from a business perspective, this name will be meaningless for any recipients. To overcome this problem, Spark Scala Email Utility allows changing the name of Attachment. You only need to send List of Path as below syntax:

"FileName1, FileName1 Path; FileName2, FileName2 Path;........

The utility allows you to pass Name as comma-separated (,) within a complete path. In the above code, we have sent a List of Files as E:/Data/AvgSalary/part-00000–1e7f3cab-49db-48cd-bb49–6d780f73201f-c000.csv that need to be sent as “AvgSalaryReport.csv, E:/Data/AvgSalary/part-00000–1e7f3cab-49db-48cd-bb49–6d780f73201f-c000.csv where AvgSalaryReport.csv will be the name of Report.

Update List of files logic in above code as below:

Output:

Mail with the changed Name of Attachment

Note: In the sendMail method, we have used Mail Type “R” which sends Report message. Scala Spark Mail utility support more parameters. You can get more details about the parameters here.

Click here to get the complete example of Spark code with Spark Email utility.

P.S. I am trying to improve Scala Spark Email Utility. Please post a comment or email me if you have any suggestions to make it more advance.

Nikhil Suthar

Written by

Big Data Developer, India

More From Medium

Also tagged Dataframes

Also tagged Scala

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade