Splunk: Creating Analytical Dashboards with Splunk Enterprise

A beginner’s guide to performing quick EDA and building elegant dashboards in Splunk using data from CSVs

Published in

CodeX

10 min readMay 4, 2022

This is the heyday for analytical dashboards in the business world and every formal or informal presentation greatly benefits from pleasing visuals. However, it can be time-consuming and cumbersome to handle huge amounts of data and to even explore it statistically — let alone create visualisations from it. Despite there being other more popular alternatives, Splunk gives a decent and simple enough tool for people who do not wish to waste time learning a new framework or programming language when they can be making beautiful dashboards and gaining impressive insights.

Splunk is not something that many people are familiar with in regular use. However, after using it, it is easy to see how its data connectivity, easy-going UI, and fundamental analytical utilities can be useful for some quick data analysis.

Why Splunk?

Python and Tableau are certainly more popular when it comes to creating formal reports or having more flexibility in data analysis and visualisation. Tableau provides rapid data handling and interactive UIs while libraries like Seaborn or Plotly in Python give more flexibility in the graphical utilities. Plus the latter can be easily integrated with a Python-based web application.

However, for many people, both of those tools require a lot of trial and error if you haven’t already worked with them for several projects. Moreover, Splunk is ideal for businesses that want more options to load their datasets directly from TCP/UDP ports, Cloud platforms, local databases, or APIs. In addition to that, it supports a large number of file formats for direct uploads. These functionalities help smoothen the data management process without utilising expensive memory resources.

Splunk Enterprise

One of the simplest ways to start with Splunk would be with Splunk Enterprise. Splunk Enterprise is one of the products given by Splunk to load, search, analyze, and visualize one’s data. A major advantage of using Enterprise is how it supports loading data from a large variety of resources and is simple for some amateur analysis and dashboard generation.

Install Splunk Enterprise from this link. On this page you’ll have to:

Create a Splunk Account
Download the package based on your OS

For this tutorial, we use Splunk 8.2.6. on a Windows 10 OS.

It is worth noting that Splunk Enterprise is a paid service. While it’s not recommended to pay for such services unless absolutely necessary for your organization, it’s worth going through the trial period to get through your short-term projects. The free trial option is activated by default during activation but you can change it to the free tier once Splunk has been installed.

Next, the default installation might ask you to create login credentials for the webserver. For Splunk Enterprise (not Splunk Free), you will use this to log in to the Splunk local server each time you open Splunk Enterprise. As mentioned on the Splunk Free webpage, Splunk Free does not create different users so it won’t need personalised credentials to access the local server.

Once the installation is complete, you can directly launch the Splunk web server. Alternatively, for Windows, you can go to “Services” from the Start Menu, look for “Splunkd Service” and start it from there. The service will most commonly run at localhost:8000.

Using Splunk Enterprise

The home page would look something like this

The first step for any data analysis project would always be loading and exploring the data.

For the sake of simplicity, we’ll only explore the option to upload data files from your computer. Alternatively, to just get familiar with Splunk, you can also use one of the many default datasets that Splunk provides.

Using default dataset

Head over to the “Search & Reporting” window from the landing page.

Switch to the Datasets tab to see the available datasets. This version of Splunk has 33 datasets from which we’ll choose the Chicago Crime dataset (chicago-crime.csv). Selecting it will load a preview of the dataset as shown below.

From the top right, go to Explore->Investigate in Search to begin the analysis process.

Loading Dataset from your computer

You can access this option from the “Add Data” section of the landing page.

To keep things consistent for this blog, I downloaded the larger Chicago Crimes dataset from the official website: Crimes — 2001 to Present. You can head over to data.cityofchicago.org and find the crimes datasets under “Public and Safety” as well. The CSV file of the former dataset will be around 39MB in size. Based on the data you choose, I’d recommend removing spaces from the column headers to make the querying easier.

Upload the file and go “Next”.
In the “Set Source Type” section, set source type as “csv” if not already set and under “Timestamp” set Extraction as “Current”. Be sure to “Save As” and set any arbitrary name.
In the next section, create a new index by setting a simple index name like “chicagocrime”.
Review and Submit.

Querying the dataset

From this point on, we will work with the default chicago-crime dataset given in Splunk. After selecting “Investigate in Search”, we see the Search & Reporting screen as shown below.

On this screen, there is a search bar to write queries and the set of panes below display the results. For search queries like aggregation, selection, filtering, etc. the results will be shown on the Statistics pane. For simplicity, let’s not explore the Events or Patterns panes.

For a quick guide on the querying keywords refer to the documentation page here. We’ll explore some elementary queries now.

Displaying the entire data table

For the default dataset (note the pipe “|” in the beginning):

| from inputlookup:”chicago-crime.csv”

For the uploaded dataset, just mention the name of the index you set during the data loading stage:

index="chicagocrime"

Both of these queries will display the entire data as a table:

Selecting a subset

| from inputlookup:"chicago-crime.csv" | table description, primary_type | head 500

The table keyword will take a list of column names and select those columns from the entire set. Similarly, using head n will select the first n rows in the set. Every new command or utility needs to be appended after adding a pipe “|”.

Get Stats

Let’s see a few basic queries to explore the dataset and also get an idea of how to write them.

Before writing the query, we must be clear on the results which we want to produce. For instance, if we want to see how many crimes have been reported in each unique location, we can simply count the occurrences of every unique value in the location_description column.

For aggregation operations like sum, count, mean, first, or range, we use the stats keyword in Splunk querying. stats followed by the keyword for the operation (in our case, count by) and the column name will create a column named “count” that will correspond to each unique value in the location_description column.

Moreover, we will sort the results in descending order of count and display the first 10 values using head. This is what the query looks like:

| from inputlookup:"chicago-crime.csv" | stats count by location_description | sort - count | head 10

And this is the resultant table:

The location “STREET” has the most reported number of crimes in this dataset.

Since we have performed an aggregation operation and have two correlated columns (location description corresponding to a numeric value indicating the count), we can take a look at the Visualisation pane next to the current Statistics pane. It will generate a quick chart based on the current results.

For example, a pie chart

You can also change the chart type to something else like a horizontal bar chart, or a line chart.

Next, let’s look deeper into the location “STREET” and see what types of crimes are most common there. For that, we need to select the rows that have STREET under location_description and then count the occurrences of unique values of the primary_type column.

So all we need to change structurally from the previous query is to add the filtering operation using the search keyword.

| from inputlookup:"chicago-crime.csv" | search location_description="STREET" |stats count by primary_type | sort - count | head 10

THEFT is the most commonly occurring crime on the street.

Again we can plot the results since we have correlated columns

These will be the type of charts we will include in our dashboard.

In this manner, you need to look at the dataset and explore features in the table view stage to figure out what kinds of information you can extract from its columns.

Another intuitive question would be to ask what kind of places a particular crime mostly occurs? Consider the crime description “DOMESTIC BATTERY SIMPLE”. Based on the description we should expect the most common locations to include households or indoor spaces.

| from inputlookup:"chicago-crime.csv" | search description="DOMESTIC BATTERY SIMPLE" | stats count by location_description | sort - count | head 10

Yes, our assumption was true and the data backs it. You can answer similar questions and support or refute your assumptions using data analysis.

Creating Maps

Two of the columns in our dataset have location coordinates of these crimes, these values as numbers cannot be used for data analysis readily but Splunk provides add-on Apps to plot these coordinates on the Map!

The Maps+ for Splunk app clusters the location coordinates and shows the number of instances in that cluster. Basically, we get to see the regions of Chicago where most crime takes place. Zooming into this map, the clusters get more granular and divide into multiples as we zoom closer.

To install Maps+ for Splunk, head over to Apps->Find More Apps and search for this add-on tool. Enter the credentials you created for splunk.com (not the localhost login) and install it.

After installation, head back to Search and enter the following query. The table keyword creates a sub-table out of the current one.

| from inputlookup:"chicago-crime.csv" | table latitude, longitude

Maps+ only needs the table to have two columns named latitude and longitude respectively. Our table only has these two columns so Maps+ will simply display the count as its clusters.

Go to the Visualisation pane and select the Maps+ option from the select visualisation menu, and a map will show up!

Creating the Dashboard

Once you have formed and tested all the queries and previewed their visualisations, go to the Dashboards bar from the menu bar above the search.

Select Create New Dashboard, set a title, and continue with the classic dashboard option.

Now you can add panels, select charts to display in that panel, and add the query that the Dashboard will execute to generate the required charts. For instance, entering the first query we created in the Pie Chart option:

Click Add to Dashboard and you’ll see that the chart is added once the query execution is complete.

Similarly, you can add all the queries we discussed here and more and generate the respective charts.

You can also drag and shift the charts to put them side-by-side.

So now you can finally rapidly make aesthetically pleasing dashboards with Splunk! Once the dashboard is saved, you can export it as a PDF to share with your colleagues.

However, to share this interactive dashboard as a Splunk source, you can switch to “Source” instead of “UI” on the dashboard screen. Here you will see the HTML source code of this dashboard. Sharing this code along with the dataset (not required for default) will let other people reproduce the charts in their Splunk Enterprise app. Here is the code for the charts we created today.

Conclusion

In this article, we looked at how Splunk can be used to perform rudimentary data analysis using a basic querying language and other stats generated automatically by Splunk. We also learned how to quickly create dashboards from the results of these queries to generate insights from a large quantity of data.

Thank you so much for reading!