Splunk Integration and Monitoring with Zaloni’s DataOps Platform, Arena

Ajinkya Rasam
Zaloni Engineering
Published in
7 min readApr 14, 2021

This technical blog is discussing Splunk Integration with Zaloni’s Arena platform. Note that these integration instructions will work with any Java Virtual Machine (JVM).

Photo by Campaign Creators on Unsplash

Overview

With the ever-increasing number of virtual machines and applications, it becomes difficult to monitor their performance, availability and provide proper error troubleshooting in one place. It’s important to herd the cats, to ensure smooth functioning of day-to-day business activities and quick troubleshooting to make it up to RTO (Recovery Time Objective) goals. This can be easily achieved using APM (Application Performance Management) or a log aggregation tool such as Splunk. We will use a microservice-based application called “Arena” for our use case.

Arena was formerly known as ZDP. Therefore, these terms are interchangeable to an extent.

Splunk can be utilized in different ways to monitor Arena’s health, raise alerts when critical incidents occur and also monitor aggregated logs produced by Arena’s microservices. This document talks about Arena health metrics, logs and ports which can be monitored using Splunk and how to set it up. We will set up 2 applications:

  1. Splunk Server: used for monitoring, creating dashboards, querying logs and performing analytics.
  2. Universal forwarder: used to forward logs to Splunk server for aggregation.

For detailed instructions on setup, please see: Setting up Splunk Enterprise server

Monitoring Metrics

This section talks about recommended Arena metrics to monitor using not only Splunk but any application monitoring software.

JVM Monitoring

Arena consists of 13 microservices, most of them, except ZDP-Kibana, spins up a java (JVM) process on startup to which a JVM monitoring agent can be attached. Steps to configure the JVM agent are documented in the JVM monitoring setup section.

Log Aggregation

The Arena application has 13 microservices that write logs to their respective directories. Below is a list of service names along with their default log directories. This can be configured to forward logs to the Splunk server using the Splunk universal forwarder.

Port Monitoring

Additionally, one can monitor ports to ensure that these services listen in order to check the liveliness of the service. Below is a list of default ports Arena services listen to.

Setting up Splunk Enterprise server

  1. Download Splunk for OS:
    https://www.splunk.com/en_us/download.html

2. Untar Splunk package:

sudo tar -xvzf splunk_package.tgs — C /opt/
cd /opt/splunk/bin
export SPLUNK_HOME=/opt/splunk/

3. Start Splunk server

./splunk start — accept-license

6. Create username and pass. This will be used to login to Splunk portal

7. Splunk portal URL will be displayed at the end of server start

Setting up Universal Forwarder

Universal forwarder should be installed on all nodes which have Arena microservices running. It will monitor Arena log directories and continuously send o/p to the Splunk server for aggregation. Additionally, to monitor JVM metrics, we need to install a Splunk JMX add-on. You can follow the instructions in this video or follow the steps below:

Open port on Splunk server to listen to the forwarder:

  1. Goto Splunk server URL
  2. Goto Setting > Forward and receiving
  3. Select Add new in receiving

4. Specify which tcp port server should listen on. Ensure the port is empty using the command:

sudo netstat -tunlp | grep <port num>

5. Goto Settings > server control > restart server

6. Create a separate index in Splunk for Arena. You can do that by going to Settings > Indexes > Add new index on Splunk server. This index will be used to forward all logs related to Arena.

Download and install Universal forwarder

This needs to be performed on all nodes where Arena services are running.

  1. Download Splunk forwarder (Preferably tar file): https://www.splunk.com/en_us/download/universal-forwarder.html
  2. Place tar file on Arena node
  3. Install and start the forwarders using the commands below:
sudo tar xzvf splunkforwarder.tgz -C /opt/
cd /opt/splunkforwarder/bin/
sudo ./splunk start — accept-license
sudo ./splunk enable boot-start

4. Change forwarder credentials:

sudo ./splunk edit user admin -password <pass> -role admin -auth admin:changeme

Configure forwarder to send data to Splunk server

  1. Command:
sudo ./splunk add forward-server <splunk-server-host>:<configured port> -auth <forwarder_username>:<forwarder_pasword>

2. Add log directories to forwarder which needs to be sent to Splunk server

sudo ./splunk add monitor /var/log/<service_name>/ -index ZDP -sourcetype <service_name> -auth user:pass
  1. From the Splunk server, you can now query using the hostname of the forwarder to see log data

To list all forwards:

sudo ./splunk list forward-server -auth user:pass

Using Addon: Monitoring of Java Virtual Machines with JMX

To monitor JVMs, we need to install a plugin called “JMX.” Click on “Find More Apps” and search “Monitoring of Java Virtual Machines with JMX.”

How to Configure an agent with ZDP-services

The example below shows how to configure a zdp-gateway with the JMX monitoring agent. However, this can be extended to configure the rest of Arena services, except zdp-Kibana. The only difference would be in the way they are set.

  1. Stop zdp-gateway service
sudo systemctl stop zdp-gateway

2. Open /etc/sysconfig/zdp-gateway

3. Add the below properties to conf file.

JAVA_OPTS="$JAVA_OPTS
-Dcom.sun.management.jmxremote.port=<portNum> \
-Dcom.sun.management.jmxremote.authenticate=false\-Dcom.sun.management.jmxremote.ssl=false"

portNum is the port number through which you want to enable JMX/RMI connections. Be sure to specify an unused port number.

For simplicity, authentication and ssl are disabled. It’s enabled by default and instructions to set it up are in this oracle doc.

4. On Splunk host server node edit

$SPLUNK_HOME/etc/apps/SPLUNK4JMX/bin/config/config.xml

5. Set property:

<jmxserver host=”<JVM_host>” jvmDescription=”zdp-gateway_node_2" jmxport=”<port configured>”>

For the rest of Arena services please use the properties below, if java_opts already have some value, you can append these properties. Config files will be located at /etc/sysconfig/<service_name>

Activating JMX Add on

  1. Add data and chose JVM Addon
  2. Get 7-day activation key from https://www.baboonbones.com/#activation
  3. Enable the data source status

Enjoy!

Arena log analysis configuration

Splunk provides different dashboard design options to keep an eye on the performance of Arena VM’s. In this section we will walk you through some examples of setting up reports, performing analysis, and setting up a notification alert system.

This section assumes that you have set up the Splunk forwarder in all nodes where Arena services are running using the step mentioned above in section: Setting up Universal Forwarder

  1. Open Splunk enterprise server URL and select search & reporting

2. In the query window, use index=“<arena_index_name>”. We have used replica2 as index name for Arena logs.

3. You can also add keywords such as `Error` OR `Exception` to search for related entries in logs.

4. If you want to narrow down logs for specific zdp-service. Use sourcetype=<service_name> in query. This property is created when you run add monitor command on the Splunk forwarder node.

5. Click on Patterns to automatically detect frequently occurring error patterns. Further, you can create a trigger and specify an action when similar errors occur in the future.

There are multiple options for action to be taken after an error occurs, such as running script, sending email, or even a webhook call. So, play around!

Troubleshooting

error “Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.11.13.116”

Resolution:

ZDP-gateway was not coming up due to which port 8090 was not available.

In ZDP /etc/zdp-gateway/zdp-gateway-5.0.2-FINAL.conf modified the JAVA_HOME from #JAVA_HOME=“/usr/java/jdk1.8.0_202-amd64/jre”

to JAVA_HOME=“/usr/java/jdk1.8.0_202-amd64"

ZDP-gateway was up and running now.

--

--