Configure Azure Data bricks to send events to Application insights — Simplified

Balamurugan Balakreshnan
Analytics Vidhya
Published in
4 min readMar 6, 2021

Configure log analytics and Application insights in Azure data bricks

Use case

  • Azure data bricks has native integration to Azure monitor
  • But the challenge is to get runtime errors
  • Application code able to send custom logs or events
  • Log trace logs from runtime exception
  • To help troubleshoot usage errors in runtime

Pre Requistie

Code

  • First create a another cluster with databricks runtime version 7.5
  • This cluster is used to create shell script instead of using command line to upload.
  • no libraries are needed
  • Create a new scala notebook to create app insight init script
  • Check the Stage_dir directory
  • To get Workspace ID and key
  • Go to Log analytics workspace
  • Select Advanced Settings -> Connected Sources -> Agents Management
  • Copy the ID and Key
  • To get Application insights instrumentation key go to application insights resource
  • Scroll down to properties
  • Copy the Instrumentation Key
  • For the scipt make sure the directories and format match
  • a small change with syntax issues might cause the cluster not to start
  • replace the KEY and ID below script
APPINSIGHTS_INSTRUMENTATIONKEY="xxxxxx-xxxxxxx-xxxxxxx" LOG_ANALYTICS_WORKSPACE_ID="xxxxxx-xxxxxxx-xxxxxxx" LOG_ANALYTICS_PRIMARY_KEY="xxxxxx-xxxxxxx-xxxxxxx"%python
dbutils.fs.put("dbfs:/appinsights/appinsights_logging_init.sh","""
#!/bin/bash

STAGE_DIR="/dbfs/appinsights/"
APPINSIGHTS_INSTRUMENTATIONKEY="xxxxxx-xxxxxxx-xxxxxxx"
LOG_ANALYTICS_WORKSPACE_ID="xxxxxx-xxxxxxx-xxxxxxx"
LOG_ANALYTICS_PRIMARY_KEY="xxxxxx-xxxxxxx-xxxxxxx"

echo "BEGIN: Upload App Insights JARs"
cp -f $STAGE_DIR/applicationinsights-core-*.jar /mnt/driver-daemon/jars || { echo "Error copying AppInsights core library file"; exit 1;}
cp -f $STAGE_DIR/applicationinsights-logging-log4j1_2-*.jar /mnt/driver-daemon/jars || { echo "Error copying AppInsights Log4J library file"; exit 1;}
echo "END: Upload App Insights JARs"

echo "BEGIN: Upload Spark Listener JARs"
cp -f $STAGE_DIR/adbxmonitor_*.jar /mnt/driver-daemon/jars || { echo "Error copying Spark Listener library file"; exit 1;}
echo "END: Upload Spark Listener JARs"

echo "BEGIN: Setting Environment variables"
sudo echo APPINSIGHTS_INSTRUMENTATIONKEY=$APPINSIGHTS_INSTRUMENTATIONKEY >> /etc/environment
echo "END: Setting Environment variables"

echo "BEGIN: Updating Executor log4j properties file"
sed -i 's/log4j.rootCategory=INFO, console/log4j.rootCategory=INFO, console, aiAppender/g' /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
echo "log4j.appender.aiAppender=com.microsoft.applicationinsights.log4j.v1_2.ApplicationInsightsAppender" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
# echo "log4j.appender.aiAppender.DatePattern='.'yyyy-MM-dd" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
echo "log4j.appender.aiAppender.layout=org.apache.log4j.PatternLayout" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
echo "log4j.appender.aiAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
echo "END: Updating Executor log4j properties file"

echo "BEGIN: Updating Driver log4j properties file"
sed -i 's/log4j.rootCategory=INFO, publicFile/log4j.rootCategory=INFO, publicFile, aiAppender/g' /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo "log4j.appender.aiAppender=com.microsoft.applicationinsights.log4j.v1_2.ApplicationInsightsAppender" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
# echo "log4j.appender.aiAppender.DatePattern='.'yyyy-MM-dd" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo "log4j.appender.aiAppender.layout=org.apache.log4j.PatternLayout" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo "log4j.appender.aiAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo "END: Updating Driver log4j properties file"

echo "BEGIN: Updating Azure Log Analytics properties file"
sed -i "s/^exit 101$/exit 0/" /usr/sbin/policy-rc.d
wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh && sh onboard_agent.sh -w $LOG_ANALYTICS_WORKSPACE_ID -s $LOG_ANALYTICS_PRIMARY_KEY
sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/PerformRequiredConfigurationChecks.py'
/opt/microsoft/omsagent/bin/service_control restart $LOG_ANALYTICS_WORKSPACE_ID
echo "END: Updating Azure Log Analytics properties file"

echo "BEGIN: Modify Spark config settings"
cat << 'EOF' > /databricks/driver/conf/adbxmonitor-spark-driver-defaults.conf
[driver] {
"spark.extraListeners" = "com.microsoft.adbxmonitor.adbxlistener.AdbxListener"
}
EOF
echo "END: Modify Spark config settings"
""", True)
%sh cat /dbfs/appinsights/appinsights_logging_init.sh

Create Azure data bricks cluster

  • Create a new Cluster
  • Select databricks runtime as 7.5
  • Leave all the settings as default
  • Go to Advanced Settings
  • Select init scripts
  • Add this as location
dbfs:/appinsights/appinsights_logging_init.sh
  • Start the cluster and wait for it to start
  • Once started then we are good.

Notebook Code to test

  • Once the cluster is started
  • Create a scala notebook
  • see if the instrumentation key
%sh echo $APPINSIGHTS_INSTRUMENTATIONKEYimport org.apache.log4j.LogManager

val log = LogManager.getRootLogger
// val log = org.apache.log4j.LogManager.getLogger("aiAppender")

log.warn("WARN: Hi from App Insights on Databricks 07")
log.info("INFO: Hi from App Insights on Databricks 07")
  • Now go to Application insights
  • Go to logs
  • write the below query
traces 
| project
message,
severityLevel,
LoggerName=customDimensions["LoggerName"],
LoggingLevel=customDimensions["LoggingLevel"],
SourceType=customDimensions["SourceType"],
ThreadName=customDimensions["LoggingLevel"],
SparkTimestamp=customDimensions["TimeStamp"],
timestamp
| order by timestamp desc
import com.microsoft.applicationinsights.TelemetryClient
import com.microsoft.applicationinsights.TelemetryConfiguration

val configuration = com.microsoft.applicationinsights.TelemetryConfiguration.createDefault()
configuration.setInstrumentationKey(System.getenv("APPINSIGHTS_INSTRUMENTATIONKEY"))

val telemetryClient = new TelemetryClient(configuration)
telemetryClient.trackEvent("Test App Insights Scala code via App Insights API 08")
telemetryClient.flush()
  • Now go to Application insights
  • Go to logs
  • write the below query
customEvents

Originally published at https://github.com.

--

--