Parse Azure ADLS Gen2 Audit logs using Spark

Balamurugan Balakreshnan
Jan 1 · 2 min read

One of the challenge with Azure Data Lake gen2 storage is the ability to log audit logs. For most of Azure service, we have options to send to log analytics. Since this is storage we don’t have that option but we can log the details in $logs folder in the same storage it self.

So the next is how do we parse the log and do some analytic or troubleshooting. This article is more towards to show you how we can view the log and do simple aggregation.

Use Case:

  • For auditing and compliance we need to look at audit logs.
  • Azure ADLS gen2 doesn’t send to log analytics
  • ADLS gen2 instead writes to container called $logs
  • Want to view the logs and do analytics on logs

Prerequisites

  • Azure account
  • Create Azure ADLS gen2
  • Create Azure Databricks
  • Create Azure Key vault
  • Create a cluster with latest version for runtime
  • Connect to Azure Databricks to Key vault for Scopes
  • Now time to create the notebook

Code Steps

Let’s get the stroage secret from Azure Keyvault

val accbbstorekey = dbutils.secrets.get(scope = "scopename", key = "storagekey")
  • Now configure the storage specification
spark.conf.set(
"fs.azure.account.key.storageaccountname.blob.core.windows.net",
accbbstorekey)
  • Now lets read the logs file
val logs = spark.read.format("csv")
.option("header", "false")
.option("delimiter", ";")
.load("wasbs://$logs@storageaccountname.blob.core.windows.net/blob/2020/12/*/*/*.log")
  • display logs
display(logs)
  • Display the schema
logs.schema
  • Now lets do some aggregation
  • Below i am grouping based on operation to specific file and do a count to see which operations to which files are used a lot.
display(logs.select("_c2", "_c12").groupBy("_c2","_c12").count())

From this point Imagination is your limitation. We can research what each column means and who various analytics to find usage, logins and other more details.

For More details please click

Original article can be found here

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…