Gathering all required data and transform that data into our desired report is major use cases of Apache Spark. Reports type and their information can vary as per business requirements. But if we talk about Apache Spark, all these report irrespective of their information or data are store as DataFrame.
In many cases, we require to send final reports to the Business holders or subject matter experts (SME). In that particular scenario, we can leverage Scala Spark Email utility features and send a final report/dataframe to a mail using Spark code.
This is the second blog where we will discuss one more use case of Scala Spark Email Utility. Click here to visit the first Blog.
Scala Spark Email Utility with DataFrame
We can send any DataFrame directly to mail using this Utility. Below are examples where we will send DataFrame as a Report directly to a mail.
Note: Visit the first Blog to check how to download and import Scala Spark Email utility.
Example 1: Sending DataFrame as a Report
Let’s take one Spark Example where we will load one csv file into DataFrame and will do some calculation or transformation. In the end, we will send the final transformed DataFrame as a report into mail directly.
Above code read company.csv file and calculate the average Salary of each company in AvgSalaryDF. We have written a method createHtmlEmailBody that converts dataFrame into an HTML format and returns as a String message. Next, we have passed that HTML message as a parameter to sendMail method of the utility that further decodes HTML format and sends mail to all recipients. That dataframe will look as below in the mail.
Example 2: Sending DataFrame as an Attachment
Spark Email Utility provides multiple parameters to send Emails. You can get more details about the parameters here. In this example, we will use one more parameter and will send Final DataFrame as an attachment to the mail.
In the above code, we have written the final dataFrame AvgSalaryDF to the Output path and added one new method (getPathOfCSVFiles)that gives a list of the complete path of all csv files written to Output Path. That List we have converted into String of all files path delimited by a semicolon (;). Since Spark Email Utility takes all path as a String datatype and delimited by a semicolon (;) as mentioned in the GitHub Link. Finally, we have passed a list of files along with the HTML format message of DataFrame AvgSalaryDF. We will get mail as below Output.
You can see the above output, where the attachment file name is part-0000…. that is Apache Spark naming convention of any written files. But from a business perspective, this name will be meaningless for any recipients. To overcome this problem, Spark Scala Email Utility allows changing the name of Attachment. You only need to send List of Path as below syntax:
"FileName1, FileName1 Path; FileName2, FileName2 Path;........
The utility allows you to pass Name as comma-separated (,) within a complete path. In the above code, we have sent a List of Files as “E:/Data/AvgSalary/part-00000–1e7f3cab-49db-48cd-bb49–6d780f73201f-c000.csv” that need to be sent as “AvgSalaryReport.csv, E:/Data/AvgSalary/part-00000–1e7f3cab-49db-48cd-bb49–6d780f73201f-c000.csv” where AvgSalaryReport.csv will be the name of Report.
Update List of files logic in above code as below:
Note: In the sendMail method, we have used Mail Type “R” which sends Report message. Scala Spark Mail utility support more parameters. You can get more details about the parameters here.
Click here to get the complete example of Spark code with Spark Email utility.
P.S. I am trying to improve Scala Spark Email Utility. Please post a comment or email me if you have any suggestions to make it more advance.