Configure Azure Data bricks to send events to Application insights — Simplified
Published in
4 min readMar 6, 2021
Configure log analytics and Application insights in Azure data bricks
Use case
- Azure data bricks has native integration to Azure monitor
- But the challenge is to get runtime errors
- Application code able to send custom logs or events
- Log trace logs from runtime exception
- To help troubleshoot usage errors in runtime
Pre Requistie
- Azure Account
- Azure Databricks
- Azure Application insights
- Create Log analytics
- Get workspace ID and key
- Get Application insights key
- Get the contents from this github Repo: https://github.com/AnalyticJeremy/Azure-Databricks-Monitoring
- I also downloaded the jar file and scripts
- Files are available in the Repo
Code
- First create a another cluster with databricks runtime version 7.5
- This cluster is used to create shell script instead of using command line to upload.
- no libraries are needed
- Create a new scala notebook to create app insight init script
- Check the Stage_dir directory
- To get Workspace ID and key
- Go to Log analytics workspace
- Select Advanced Settings -> Connected Sources -> Agents Management
- Copy the ID and Key
- To get Application insights instrumentation key go to application insights resource
- Scroll down to properties
- Copy the Instrumentation Key
- For the scipt make sure the directories and format match
- a small change with syntax issues might cause the cluster not to start
- replace the KEY and ID below script
APPINSIGHTS_INSTRUMENTATIONKEY="xxxxxx-xxxxxxx-xxxxxxx" LOG_ANALYTICS_WORKSPACE_ID="xxxxxx-xxxxxxx-xxxxxxx" LOG_ANALYTICS_PRIMARY_KEY="xxxxxx-xxxxxxx-xxxxxxx"%python
dbutils.fs.put("dbfs:/appinsights/appinsights_logging_init.sh","""
#!/bin/bash
STAGE_DIR="/dbfs/appinsights/"
APPINSIGHTS_INSTRUMENTATIONKEY="xxxxxx-xxxxxxx-xxxxxxx"
LOG_ANALYTICS_WORKSPACE_ID="xxxxxx-xxxxxxx-xxxxxxx"
LOG_ANALYTICS_PRIMARY_KEY="xxxxxx-xxxxxxx-xxxxxxx"
echo "BEGIN: Upload App Insights JARs"
cp -f $STAGE_DIR/applicationinsights-core-*.jar /mnt/driver-daemon/jars || { echo "Error copying AppInsights core library file"; exit 1;}
cp -f $STAGE_DIR/applicationinsights-logging-log4j1_2-*.jar /mnt/driver-daemon/jars || { echo "Error copying AppInsights Log4J library file"; exit 1;}
echo "END: Upload App Insights JARs"
echo "BEGIN: Upload Spark Listener JARs"
cp -f $STAGE_DIR/adbxmonitor_*.jar /mnt/driver-daemon/jars || { echo "Error copying Spark Listener library file"; exit 1;}
echo "END: Upload Spark Listener JARs"
echo "BEGIN: Setting Environment variables"
sudo echo APPINSIGHTS_INSTRUMENTATIONKEY=$APPINSIGHTS_INSTRUMENTATIONKEY >> /etc/environment
echo "END: Setting Environment variables"
echo "BEGIN: Updating Executor log4j properties file"
sed -i 's/log4j.rootCategory=INFO, console/log4j.rootCategory=INFO, console, aiAppender/g' /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
echo "log4j.appender.aiAppender=com.microsoft.applicationinsights.log4j.v1_2.ApplicationInsightsAppender" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
# echo "log4j.appender.aiAppender.DatePattern='.'yyyy-MM-dd" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
echo "log4j.appender.aiAppender.layout=org.apache.log4j.PatternLayout" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
echo "log4j.appender.aiAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n" >> /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties
echo "END: Updating Executor log4j properties file"
echo "BEGIN: Updating Driver log4j properties file"
sed -i 's/log4j.rootCategory=INFO, publicFile/log4j.rootCategory=INFO, publicFile, aiAppender/g' /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo "log4j.appender.aiAppender=com.microsoft.applicationinsights.log4j.v1_2.ApplicationInsightsAppender" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
# echo "log4j.appender.aiAppender.DatePattern='.'yyyy-MM-dd" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo "log4j.appender.aiAppender.layout=org.apache.log4j.PatternLayout" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo "log4j.appender.aiAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n" >> /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
echo "END: Updating Driver log4j properties file"
echo "BEGIN: Updating Azure Log Analytics properties file"
sed -i "s/^exit 101$/exit 0/" /usr/sbin/policy-rc.d
wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh && sh onboard_agent.sh -w $LOG_ANALYTICS_WORKSPACE_ID -s $LOG_ANALYTICS_PRIMARY_KEY
sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/PerformRequiredConfigurationChecks.py'
/opt/microsoft/omsagent/bin/service_control restart $LOG_ANALYTICS_WORKSPACE_ID
echo "END: Updating Azure Log Analytics properties file"
echo "BEGIN: Modify Spark config settings"
cat << 'EOF' > /databricks/driver/conf/adbxmonitor-spark-driver-defaults.conf
[driver] {
"spark.extraListeners" = "com.microsoft.adbxmonitor.adbxlistener.AdbxListener"
}
EOF
echo "END: Modify Spark config settings"
""", True)%sh cat /dbfs/appinsights/appinsights_logging_init.sh
Create Azure data bricks cluster
- Create a new Cluster
- Select databricks runtime as 7.5
- Leave all the settings as default
- Go to Advanced Settings
- Select init scripts
- Add this as location
dbfs:/appinsights/appinsights_logging_init.sh
- Start the cluster and wait for it to start
- Once started then we are good.
Notebook Code to test
- Once the cluster is started
- Create a scala notebook
- see if the instrumentation key
%sh echo $APPINSIGHTS_INSTRUMENTATIONKEYimport org.apache.log4j.LogManager
val log = LogManager.getRootLogger
// val log = org.apache.log4j.LogManager.getLogger("aiAppender")
log.warn("WARN: Hi from App Insights on Databricks 07")
log.info("INFO: Hi from App Insights on Databricks 07")
- Now go to Application insights
- Go to logs
- write the below query
traces
| project
message,
severityLevel,
LoggerName=customDimensions["LoggerName"],
LoggingLevel=customDimensions["LoggingLevel"],
SourceType=customDimensions["SourceType"],
ThreadName=customDimensions["LoggingLevel"],
SparkTimestamp=customDimensions["TimeStamp"],
timestamp
| order by timestamp desc
import com.microsoft.applicationinsights.TelemetryClient
import com.microsoft.applicationinsights.TelemetryConfiguration
val configuration = com.microsoft.applicationinsights.TelemetryConfiguration.createDefault()
configuration.setInstrumentationKey(System.getenv("APPINSIGHTS_INSTRUMENTATIONKEY"))
val telemetryClient = new TelemetryClient(configuration)
telemetryClient.trackEvent("Test App Insights Scala code via App Insights API 08")
telemetryClient.flush()
- Now go to Application insights
- Go to logs
- write the below query
customEvents
Originally published at https://github.com.