How To Parse an Email And Analyze With C#

Bora Kaşmer
Geek Culture
Published in
6 min readJan 7, 2023

Hi, Today we will try to parse an email to “From”, “To”, “Header”, “Body”, “Cc”, “Bcc”, “URLs”, and “Attachments”. Then we will analyze for malicious links and files(attachments) of the email by using some service. Later, we will create our custom blacklist for Url, Body, From, Bcc and Cc. Finally, we will check all keywords in to them.

This is our example email. As seen below there are some links in the body and there is an attachment in this email. You can see different “bcc”, “cc”, and “from” emails on it. We will parse and Analyze all of them.

“Fear, left unchecked, can spread like a virus.” ― Lish McBride

Firstly we will create a C# .Net 7.0 Console Application

Download ChilkatCore Library as seen below. We will use this library for parsing emails.

program.cs(1): Our email is under the “mails” folder. We will try the “Load()” downloaded email with the “Chilkat” library.

using (Chilkat.Email email = new Chilkat.Email())
{
bool success = email.LoadEml("C://mails/TEMACertificate.eml");
if (success != true)
{
Console.WriteLine(email.LastErrorText);
return;
}
}

We will get the body of the email with the Chilkat email class.

 string body = email.Body;

We will get all URLs from the body with Regex. I’ve been coding for almost 30 years but Regex still looks weird to me :)

MatchCollection collection = Regex.Matches(body, @"(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?")

“A harmful truth is better than a useful lie.” -Thomas Mann

We will Analyze all URLs for security reasons. We will check is link harmful or not. For analysis services, we will use https://www.virustotal.com. We will check all URLs and attachments of the email by using this virustotal service. Firstly let’s create an account.

Next go to https://developers.virustotal.com url. And get your api key.

Virustotal ApiKey

Analyze All URLs In The Email:

program.cs(2): We will get all URLs from the body of the email. And later we will analyze all of them with the VirusTotal service one by one. You have to download the “VirusTotalNet” library from NuGet Package Manager as seen below.

  • We will create two scopes for Chilkat.Email and VirusTotal class. Don’t forget to enter your VirusTotal API key here.
  • We will load our “*.eml” email downloaded file and get the body part from the email with “Chilkat.Email” class.
  • Get URLs from the body with Regex.
  • if(urls.TryAdd(item.Value, item.Value))” : Add unique URLs to dictionary.
  • *UrlReport report = await virusTotal.GetUrlReportAsync(item.Value)”: We will send every URL to the virusTotal service asynchronously and get the report.
  • if (report.Total > 0)”: We will check found harmful something or not.
  • var detached = report.Scans.ToList().Where(u => u.Value.Detected == true).ToList()”: We will get all found information about URL.
  • Console.WriteLine($”{rep.Key}:{rep.Value}”))”: We will write the URL’s report, to the console.
using (Chilkat.Email email = new Chilkat.Email())
using (VirusTotal virusTotal = new VirusTotal("******YOUR API KEY******"))
{
bool success = email.LoadEml("C://mails/TEMACertificate.eml");
if (success != true)
{
Console.WriteLine(email.LastErrorText);
return;
}
string body = email.Body;
Dictionary<string,string> urls = new Dictionary<string,string>();
foreach (Match item in Regex.Matches(body, @"(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?"))
{
if(urls.TryAdd(item.Value, item.Value))
{
Console.WriteLine("Mails:"+ item.Value);
UrlReport report = await virusTotal.GetUrlReportAsync(item.Value);
Console.WriteLine($"Threat Found:{report.Total}");
if (report.Total > 0)
{
Console.WriteLine("Found Treat:"+item.Value);
var detached = report.Scans.ToList().Where(u => u.Value.Detected == true).ToList();
detached.ForEach(rep =>
Console.WriteLine($"{rep.Key}:{rep.Value}"));
}
}

}
}

“I’m stupidly curious. I will go and touch anything until I find out that it’s very harmful. “ — Steven Yeun

Analyze All Attachments In The Email:

  • email.NumAttachments”: We will get all Attachments count from the email with “Chilkat.Email” Class and loop in it.
  • var attachment = email.GetAttachmentData(i)”: We will get AttachmentData as a byte[].
  • *“FileReport fReport = await virusTotal.GetFileReportAsync(attachment)”: We will send every Attachment to the virusTotal service asynchronously and get the report.
  • “if (fReport.Positives > 0)”: We will check found harmful something or not.
  • “var detached = fReport.Scans.ToList().Where(u => u.Value.Detected == true).ToList()”: We will get all found information about attachment which has some harmful trojan in it.
  • “detached.ForEach(rep => Console.WriteLine($”{rep.Key}:{rep.Value}”))”: We will loop in all detached attachments, and write a report to the console.
for (int i = 0; i < email.NumAttachments; i++)
{
var attachment = email.GetAttachmentData(i);
FileReport fReport = await virusTotal.GetFileReportAsync(attachment);
if (fReport.Positives > 0)
{
var detached = fReport.Scans.ToList().Where(u => u.Value.Detected == true).ToList();
detached.ForEach(rep => Console.WriteLine($"{rep.Key}:{rep.Value}"));
}
else
{
Console.WriteLine(email.GetAttachmentFilename(i) + ": Clean");
}
}

You Can Create Your Own Analysis

These are our custom rules:

  • blackListUrl”: Blacklist urls of “body”.
  • blackListWords”: Blacklist words of “body”.
  • blackfrombccc”: Blacklist emails of “from”, “cc” and “bcc”.
string[] blackListUrl = { "https://www.borakasmer.com", "https://www.badass78.com" };
string[] blackListWords = { "creditcard", "sex", "payment", "bağış", "elektronik" };
string[] blackfrombccc = { "polatengin@gmail.com", "secil.kasmer@gmail.com", "noname@hotmail.com" };
  • We will check the custom BlackList Url as seen below: We will get all URLs from the body with Regex. And we will check every URL in the body of the mail is on the blacklist or not.
 var mailUrls = Regex.Matches(body, @"(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?");

var blackList = mailUrls.Where(url => blackListUrl.Contains(url.Value)).DistinctBy(item=>item.Value).ToList();
blackList.ForEach(item =>
{
Console.WriteLine("Found Treat Url:" + item.Value);
});
  • We will check the custom BlackList Word as seen below: We will check all words in the body are in the blackListWord or not.
 var wordList = blackListWords.Where(word => body.Contains(word)).ToList();
wordList.ForEach(item =>
{
Console.WriteLine("Found Treat Word:" + item);
});
  • We will check the custom Black From, CC and BCC mails as seen below: It is assumed that there is only one e-mail address for bcc and cc. We will check from, cc and bcc are in our frombcccList.
var frombcccList = blackfrombccc.Where(e => email.FromAddress == e || 
email.GetCcAddr(0) == e || email.GetBccAddr(0) == e).ToList();

frombcccList.ForEach(item =>
{
Console.WriteLine("Found Treat From or CC or BCC: " + item);
});

Conclusion:

In this article, we tried to detect any harmful trojans or viruses are in the mail or not. We parsed an email. We took parts “body”, “attachment”, “from”, “cc”, and “bcc” of an email. For analyzing attachments and URLs, we used the VirusTotal service. We sent them to the VirusTotal services, got the result, and reported it. Finally, we created our custom filters for analyzing from, bcc, and cc of an email. We searched our custom words of the blacklists in them and reported it.

For analyzing all emails in a company, you need to use a lot of software architectures. Performance must be your first priority. So you have to separate your Analyze service and use any Queues and you should receive your emails piecemeal. Parallel programming and async operations could be very helpful for this case. But don’t forget it, these tools could be very dangerous and you have to use them wisely.

The End

See you until the next article.

“If you have read so far, first of all, thank you for your patience and support. I welcome all of you to my blog for more!”

Source Code: https://gist.github.com/borakasmer/010d415d3bc8f9735388f659a0903a24

--

--

Bora Kaşmer
Geek Culture

I have been coding since 1993. I am computer and civil engineer. Microsoft MVP. Software Architect(Cyber Security). https://www.linkedin.com/in/borakasmer/