Correlating Tor sources in ArcSight SIEM events using Python

Isaque Profeta

Published in

Analytics Vidhya

14 min readDec 15, 2020

Versão em português brasileiro aqui.

Introduction

Anonymization, despite currently being a valuable resource on the internet, ends up being abused as a tool for many exploits and security attacks. Thus, integrating information about the usage of the Tor network (currently the main/largest anonymization network) and classifying access events to a Hosting/Web Services environment using a SIEM, assess the motivation of these accesses, is an important tactic.

This article presents a Use Case using Python and the requests library, using a Linux environment, for integrating the SIEM ArcSight ESM API with Active Lists to classify events from the Tor network.

Review of concepts

SIEM

A log is a computer system event record. For example, there are three logs for the Ubuntu Linux package installation system:

$ tail -3 /var/log/dpkg.log
2020–10–04 17:24:52 status unpacked gcc: amd64 4: 7.4.0–1ubuntu2.3
2020–10–10 13:25:09 startup archives unpack
2020–10–10 13:25:13 status installed mime-support: all

A Syslog tool is a software that aggregates information from logs of different hardware and/or software within an environment. In the example below, Linux Syslogd aggregates logs from the CRON, systemd, and anacron applications of an Ubuntu Linux operating system:

$ tail -5 /var/log/syslog.1
Oct 10 23:59:01 ubuntu-desktop CRON[3344]: (root) CMD (command -v debian-sa1> / dev / null && debian-sa1 60 2)
Oct 11 00:02:46 ubuntu-desktop systemd[1]: Started Run anacron jobs.
Oct 11 00:02:46 ubuntu-desktop anacron[3430]: Anacron 2.3 started on 2020–10–11
Oct 11 00:02:46 ubuntu-desktop anacron[3430]: Will run job `cron.daily ‘in 5 min.
Oct 11 00:02:46 ubuntu-desktop anacron[3430]: Jobs will be executed sequentially

Internet Providers and Hosting IT infrastructure have logs from various assets int the entire computational infrastructure, from Routers to Switches, including Firewalls, IPSs / IDSs, and reaching even the Application Servers and Databases. A Syslog Server joins this information in a central database of all events in that environment.

The set of tools that acts as a Security Information and Event Management, or SIEM, do a job like the Syslog Server, but with security features. This means that the logs go through an intelligence classification, generating security alerts according to the defined standards, also has a research and report generation interface. It’s also possible to trigger automated actions on Firewalls and/or IPSs that have support for external integrations.

Alert example: A SIEM can have a rule that checks for a series of 404 error logs on a Web Server and if it finds access for several non-existent application directories in a brief period, it can generate alerts for an attempted Application Scan (example of an attack using tools like dirb for example).

In this article, the Micro Focus’s ArcSight ESM will be the SIEM tool.

Tor

Tor works as the users can use or act as random and encrypted nodes forming a path between all network participants. Thus, from the point of view of the destination of a network connection, the origin machine is not the one that is accessing the resource.

Using the official example below, Alice’s computer uses a random path between the “nodes” of the network and then arrives at an “exit node”, and only them, access the destination server Bob having with its access anonymized since Bob doesn’t know Alice, he only sees the exit node’s request.

Image of official documentation: https://2019.www.torproject.org/about/overview.html.en

The source resource for this article to classify that the traffic is coming from the Tor network is that Tor makes available the list of all exit nodes used to end access the connection destinations. This list, which is frequently updated and available on Tor’s website.

For reference, there are two other lists with more information about Tor network nodes that were not used in the scope of this article:
- Exit nodes updated every 30 minutes
- All nodes (Entry / Guard, Middle and Exit), classified and updated every 30 minutes

Steps followed for this case implementation

To avoid managing many ArcSight connectors, the choice was to directly interact with the tool’s Application Programming Interface (API), with the following steps:

Get the IP list of Tor’s exit nodes using Python with the requests library.
Import these IP’s into an ArcSight’s Active List keeping it updated (a schedule).
Create a Rule in ArcSight to tag the events that have a source IP’s in that Active List.

Prerequisites

Python + PIP

In this Use Case, Python was used in its version 3 (Language) and the Pip language library installer for that version 3.

sudo apt update && sudo apt install python3 python3-pip

It is then confirmed that the Python in the operating system is version 3, by performing the test in the terminal:

~ $ python -V
Python 3.6.9
~ $ pip -V
pip 20.1.1 from /home/isaque/.local/lib/python3.6/site-packages/pip (python 3.6)

Requests

The access to data of the exit nodes repository, as well as for the integration with the ArcSight API, was performed by using the Python requests library. The installation can be done with the aid of Pip.

pip install requests

Active List

An Active List within Arcsight ESM is a data set, just like a database’s table or an Excel spreadsheet. This data set is used to store values that can be compared with other SIEM resources. In this article, it’ll save the IP’s list of exit nodes.

For this, within the Arcsight ESM console, the Active List was created with only one Fields-based column called “ip” of the Ip Address type, with an expiration of one hour and the capacity for ten thousand results. Especially important to take note of the Active List Resource ID, that’s going to be required by the script.

NOTE: An exclusive user was also created for this automation, being configured to have write privileges in this Active List.

Creating the ActiveList in ArcSight ESM 7.2

Fetch the IP list of Tor exit nodes

Fetching the list of networks

A folder called “tor” was created to store the project and, inside it, the file “tor.py” was created.

mkdir ./tor
touch ./tor/tor.py

Starting the code of “tor.py” itself, the first step was to import the requests library, and then a GET was called to search the source for information about the exit nodes. Then, it is possible to view the result with the “print()” function:

import requests
tor_networks = requests.get(
    'https://check.torproject.org/torbulkexitlist'
)
print(tor_networks.text)

The result data of IP Addresses is the body response in text format. That is why is necessary to use “tor_networks.text”, since the result of “requests.get ()” is an object of Response type that has a Response.text attribute with the required data.

With this, the content is already available to test in the terminal:

python3 ./tor.py

After this test, “print()” was replaced by a Python’s list variable, parsing the body text response using the line break “\n” with the help of string’s “split()” function for this purpose:

import requests
tor_networks = requests.get(
    'https://check.torproject.org/torbulkexitlist'
)
exit_nodes_data = tor_networks.text.split('\n')

With the data collection done, it’s time to study the ArcSight for the integration.

Import the IP’s into an ArcSight’s Active List

Interacting with the ArcSight’s API

Now, the most complex issue, since it was not possible to find many good examples in the documentation related to the ArcSight’s API (version 7.2 of the ArcSight ESM in this Use Case). There is a subtle preference for the Java model with XML / SOAP, which makes it difficult to implement the API’s methods and attributes in other languages with JSON / REST.

After study and research, the following resources were found to be useful:

PDF’s presentation about automation.
ESM Service Layer (Web Services) Developer’s Guide.
Discussions in the Micro Focus official forums (references: 1, 2, 3, and 4).

Even so, official documentation leaves some holes, and forum responses are still vague. Reverse engineering some projects on GitHub helped to produce the following solution for using the API with JSON / REST:

1: Identify in the listServices of the ArcSight ESM installation which API features need to be used (Service and Operation). This listServices is located at the URL below (except for administrative and login operations, as in these situations the manager-services in the URL are replaced to core-services).

https://url-arcsight-esm:8443/www/manager-service/services/listServices

2: Access the Service’s WSDL (XML Descriptor) on this page, within the link that has the text: “See Web Services Description Language (WSDL) here”

3: Search within the XML Descriptor for the chosen Operation and identify the search parameters

Element with Sequences that are the parameters used for search

For JSON, one of the missing things in the documentation is the description of the “resource abbreviations”, necessary for the HTTP call. For this, a functional list was found in the description of the pyaesm project (which also makes interaction with Active Lists) contained in reference in the fourth step.

4: Select the resource abbreviations of Service according to the list below:

“act” = “resource.manager / activeListService /”
“arc” = “resource.manager / archiveReportService /”
“cas” = “resource.manager / caseService /”
“cap” = “resource.manager / conAppService /”
“con” = “resource.manager / connectorService /”
“das” = “resource.manager / dashboardService /”
“dmq” = “resource.manager / dataMonitorQoSService /”
“dat” = “resource.manager / dataMonitorService /”
“drl” = “resource.manager / drilldownListService /”
“dri” = “resource.manager / drilldownService /”
“fie” = “resource.manager / fieldSetService /”
“fil” = “resource.manager / fileResourceService /”
“gra” = “resource.manager / graphService /”
“gro” = “resource.manager / groupService /”
“int” = “resource.manager / internalService /”
“man” = “resource.manager / managerAuthenticationService /”
“net” = “resource.manager / networkService /”
“by” = “resource.manager / portletService /”
“que” = “resource.manager / queryService /”
“qvs” = “resource.manager / queryViewerService /”
“rep” = “resource.manager / reportService /”
“res” = “resource.manager / resourceService /”
“sei” = “resource.manager / securityEventIntrospectorService /”
“sev” = “resource.manager / securityEventService /”
“ser” = “resource.manager / serverConfigurationService /”
“use” = “resource.manager / userResourceService /”
“vie” = “resource.manager / viewerConfigurationService /”
“inf” = “manager / infoService /”
“mss” = “manager / managerSearchService /”

5: Define the URL for endpoint API consumption in the format below, replacing the Service and Operation with the ones chosen above:

https://url-arcsight-esm:8443/www/manager-service/rest/’Service’/’Operation’

EXAMPLE:

Clear ActiveLists using API:
1: Searched the listServices and found the Service “ActiveListService” then, studying its Operations, opted to test the “clearEntries” one.
2: The WSDL link was accessed to verify the necessary attributes.
3: Found the necessary attributes in XML Descriptor for the Operation of “clearEntries”, which are “authToken” and “resourceId”.
4: Find in the reference that the resource abbreviation for “ActiveListService” is “act”.
Final JSON payload for the call:

{
 “act.clearEntries”: {
 “act.authToken”: AUTHENTICATION_TOKEN,
 “act.resourceId”: RESOURCE_ID_OF_ACTIVE_LIST
 }
}

5: The formatted endpoint is: https://url-arcsight-esm:8443/www/manager-service/rest/ActiveListService/clearEntries

It is key to know this structure to ease of navigation and understatement of which calls, and which parameters are possible to use with them.

Organization of script support code

First, it is necessary to create a function in Python with requests to perform the login and have an authentication token, only then it’s possible to perform interactions with the API methods.

It was decided to separate this function in another file (a local library) to avoid duplicated code. And, in order not to leave credentials inside code files, a configuration file was created in the same folder.

touch __init__.py
touch config.ini
touch api_arcsight.py

The “__init__.py” file remains empty, easing the import process in the project, and the “config.ini” file stores the credentials that will be used to access ArcSight with the Python’s configparser module, this file is written in the following format:

[arcsight]
user=arcsight_user
password=arcsight_password
server=https://url-of-arcsight-esm:8443

This file has the credentials of a specific user for automation, this kind of data should never be stored in version control tools, at most only a model example, called for example “config.ini.example”.

Now, in the “api_arcsight.py” file, the access to credentials is created with the “ConfigParser” object and using the “read()” and “get()” methods, which will read the “config.ini” data:

import requests
import configparserconfig = configparser.ConfigParser()
config.read("config.ini")USER = config.get('arcsight', 'user')
PASSWORD = config.get('arcsight', 'password')
SERVER = config.get('arcsight', 'server')

Login function

The “login” function was set up inside the first function of the “api_arcsight.py” file. In this case, first, the login endpoint was defined in ArcSight’s “core-services”, then HTTP headers were configured to set the format to JSON and, finally, a payload with user and password as defined in the format expected by the login endpoint.

def login():
  """
  Connects to arcsight and returns an API token
  """
  login_endpoint = (
    SERVER + '/www/core-service/rest/LoginService/login'
  )  
  headers = {
    'accept': 'application/json',
    'content-type': 'application/json'
  }  
  payload = {
    'log.login': {
      'log.login': USER,
      'log.password': PASSWORD
    }
  }

After this, a POST request was created for the defined endpoint of the core-services, having both payload and headers. Then the token was parsed from the path of the “Response.json()” dictionary, and the same was returned at the function’s end:

try:
    resp = requests.post(
      login_endpoint,
      json=payload,
      headers=headers,
      verify=False
    )
    token = resp.json()['log.loginResponse']['log.return']
  except Exception as e:
    print('Login error: {e}')
  
  return token

The option “verify=False” in “requests.post” is useful when ArcSight’s installation are configured with self-signed certificates, working as the option “Accept risk and continue” of browsers accessing the same site.

Now, it is possible to execute the login function using these two lines:

import api_arcsight
AUTHENTICATION_TOKEN = api_arcsight.login()

Wrapper function to perform API searches

To avoid code repetition when performing queries in ArcSight’s API, a wrapper function called “execute” was set up.

For this function, again the HTTP header was set to JSON format and then the endpoint, which in this function is using the manage-service and the Python’s f-string feature to be completed by the second parameter that the function receives.

def execute(payload, service_endpoint):
  """
  Call arcsight API for data from Operation Services
  """
  headers = {
    'accept': 'application/json'
  }

  endpoint = (
    SERVER + f'/www/manager-service/rest/{service_endpoint}'
  )

Using requests, a POST was set up with the defined endpoint and headers, together with the payload that is passed by the function’s first parameter. Finally, the Response object of the API response was returned.

  try:
    resp = requests.post(
      endpoint,
      json=payload,
      headers=headers,
      verify=False
    )
  except Exception as e:
    print(f'API Query error: {e}')  return resp

Thus, it is possible to perform searches with the login function using the following format:

api_arcsight.execute(
  payload={
    "resource.function": {
      "resource.first_parameter": PARAMETER_VALUE_1,
      "resource.second_parameter": PARAMETER_VALUE_2
    }
  },
  service_endpoint='Resource/resourceFunction'
)

Logout function

In this file, a last “logout” function was created, which receives the token to be “logged out” of the system.

For this, was needed to use the core-services endpoint, the header for the JSON format, and the logout payload.

def logout(authtoken): 
  """
  Revoke the token's API access
  """  login_endpoint = (
    SERVER + '/www/core-service/rest/LoginService/logout'
  )    headers = {
    'accept': 'application/json',
    'content-type': 'application/json'
  }    payload = {
    'log.logout': {
       'log.authToken': authtoken
    }
  }

Finally, another POST was executed with requests with the defined data:

  try:
    requests.post(
      login_endpoint,
      json=payload,
      headers=headers,
      verify=False
    )
  except Exception as e:
    print(f'Logout error: {e}')  return True

Thus, it is possible to execute the login function using the line:

api_arcsight.logout(AUTHENTICATION_TOKEN)

With that local library ready, you can go back to the main script and add the missing logic.

Importing the support library

At the top of the “tor.py” file, it is already possible to import the previously created module from “api_arcsight.py”:

import requests
import api_arcsight

Now, using the login function, the token was retrieved to execute the queries, and the Active List Resource ID was defined for manipulation.

AUTHENTICATION_TOKEN = api_arcsight.login()
ACTIVE_LIST = 'Ha9ZKBHUBABCAGTV5ykX2XQ=='

Clearing the Active List

Needing to keep the Active List up to date, it was decided to clear the list completely and then import all the new records. To do it first the Active List was cleaned with the execute function passing the correct endpoint and payload:

api_arcsight.execute(
  payload={
    'act.clearEntries': {
      'act.authToken': AUTHENTICATION_TOKEN,
      'act.resourceId': ACTIVE_LIST
    }
  },
  service_endpoint='ActiveListService/clearEntries'
)

Taking the data into the Active List

Since the Active List is empty, it is possible to insert the new data collected using the execute function again.

For this, the payload of the chosen addEntries operation expects two parameters: one with a list of columns (columns) and an entryList which is the list of IP entries, each formatted with a JSON object called {‘entry’: []}, that has the data list in column order.

In this case, the Active List has only one “ip” column, making it easier to just use a Python list-comprehension to feed this entry:

api_arcsight.execute(
  payload={
    'act.addEntries': {
      'act.authToken': AUTHENTICATION_TOKEN,
      'act.resourceId': ACTIVE_LIST,
      'act.entryList': {
        'columns': ['ip'],
        'entryList': [ {'entry': [reg]} for reg in registros ]
      }
    }
  },
  endpoint_servico='ActiveListService/addEntries'
)

Logout

The “logout” function was used in order not to overload the database with orphan sessions:

api_arcsight.logout(AUTHENTICATION_TOKEN)

Scheduling

Finally, it is possible to schedule the script using crontab to search and update the information every hour:

0 * * * * cd / path / script / script_tor; / usr / bin / python3 /path/script/script_tor/load_tor.py

Other options would be systemd timers from systemd or isaqueprofeta/pylineup project that uses Python + Celery to manage python script scheduled tasks

Into the ArcSight ESM console, after running the script, it’s possible to see the results in the Active List:

ActiveList loaded via API with Tor network exit nodes

Integration of the Active List with a Rule to classify events

Creation of Rule

With the Active List loaded, a Pre-persistence Rule was created, which is a rule to mark events with extra information based on conditions to filter them later for analysis:

In the Conditions an AND rule was added with an InActiveList condition comparing the Attacker Address field (source IP) of the event with the “ip” column of the Active List.

After that correlation condition, an Action was created in the Rule so that every event that passes the condition receives 3 Event Fields: a Name and two more Categories, the latter being one for the behavior (/Access) and one for the technique (/Access/Anonymized):

Visualization of data in an Active Channel

Finally, an Active Channel (which is a visualization channel within ArcSight ESM) was created with a filter for one of the label information that was inserted in Rule’s Action. With this, it is possible to view the events and confirm that they are marked:

Viewing events marked as Anonymous access

Conclusion

This Use Case allows, for example, to analyze the types of traffic that applications receive anonymously, allowing important data to incident handling and prevention. This was made possible through the integration with Python and the requests library to search the information of the exit nodes of the Tor network, in a Linux environment, and adding them in Active Lists via ArcSight ESM API for correlation.

In situations where the accesses do not show malicious activity, not triggering other correlations in your SIEM, it can be concluded that the source only wants to access that content anonymously, the legitimate scenario.

In situations where applications receive malicious activity through Tor source, it is necessary to consider blocking or redirecting access to challenge solutions, such as CAPTCHA.

Another example in these situations of anonymous traffic are forum abuse activities: with this traffic information in hand, it is possible to block the inclusion of comments with anonymous sources, to avoid offensive messages and SPAM behavior.