Enhancing Cybersecurity with Cloud Computing: RayZed Gsoc’23

Kushal Shah
SCoRe Lab
Published in
7 min readJul 31, 2023

Up to this point, the first portion of GSoC’23 has been fantastic. I was able to find a number of optimizations as well as address many road blockers. Please fasten your seatbelts because I’m about to talk about my adventure into cybersecurity, distributed computing and cloud computing thus far.

What is RayZed?

RayZed is a ray-based, distributed web vulnerability scanner designed on a ray queue. Moreover, RayZed is a cloud-native application built utilising the Terraform stack.

Motivation

Finding out web vulnerabilities for specific targets(URLs) is a critical task. For this OWASP ZAP-ZED helps by using its various scan functionalities like active, passive and spider scan. These scans scrutinise the target URL for different standard vulnerabilities and thus is an extensive process. The target(URLs) could be distributed among Ray cluster nodes deployed on the cloud-native platform where ZAP is running as a daemon. This architecture scales the process of finding vulnerabilities for targets. The purpose of this project is to create a parallelized tool that could fetch vulnerabilities of different websites in an efficient way such that the process could be scaled and automated for cybersecurity research.

Journey Uptill Now

I majorly focused on the optimizations that can be done such that huge sites’ spider and active scans could be done in the least amount of time using Zap-Zed (application level) optimizations as well as distributed parallel ray optimizations. Next, I will brief you on the scans and the optimizations that I have found and how I have linked the whole process.

What is a spider scan?

A spider scan in ZAP is a web crawling process that explores a target website to identify all the pages of a website.

What is an active scan?

Active scan in ZAP is an automated testing process that actively probes a web application for security vulnerabilities by sending customized requests and analyzing the resulting responses.

Active Scan in progress

The time a scan takes for a website is based on:

[Number pages] x [number parameters] x [number attacks] x [how long a request takes] / [number of threads]

Basically, as a heuristic spider scan’s sites tree could be used to analyze it

For niweera website Unique Nodes => 17 (17 different unique pages)

Approach 1: Implementation of multiple Zap daemons in the background

Initially, as only one ZAP daemon running per Virtual Machine of cloud-native platform if we assume multiple users to ssh to the head and there could be multiple targets then suppose:

R = Number of resources(VMs with single ZAP daemon)

M = Number of users

K1, K2, …… KM = Number of target websites submitted by M users respectively

Then R = K1 + K2 ….. KM (For zero wait time of a particular target)

To increase response time and for converging to zero wait time one potential solution is moving forward with running multiple ZAP daemon instances with different ports on a single VM to its capacity. (Reference)

This could be achieved as follows:

Command:

./zap.sh -daemon -config api.key=234 -port 1234 -dir /home/kushal/Desktop/ray/temp1

Terminal:

Parameters for multiple Zap daemons

  1. -config api key
  2. -port
  3. -dir

-config api.key=234

-port 1234

-dir /home/kushal/Desktop/ray/temp1

Now in python code also Proxies should match the port and api key should match the config api key

Also, dir indicates a directory. (Different directory for different zap instances)

Defining proxies2 as the given port in the zap daemon and also using the same api key

Multiple Zap Daemons
Testing with multiple python scan codes

Approach 2: Storing the state of the spider scan using sessions and then passing the session to the active scan

Motivation behind this : Spider scans and active scans for particular websites could be run on different Zap daemons.

As active scan requires spider scan’s sites tree it becomes necessary to pass the state of spider scan to active scan.

  1. I am using session management for this.
  2. Once the spider scan is done its session is stored (using the new_session function).
  3. And the new active scan session can pick up from the spider scan session.
  4. The prerequisites are that the session files should be copied/included in the specific directories (session directory) as different directories identify a particular daemon.
  5. Once copied in those specific directories the active scan starts using the previous output using the load_session function.
Defining a new session
Executing spider scan first
Zap Daemon 1
Spider Scan done
In session directory session data stored

.session

.session.properties

.session.log

.session.data

.session.script

Above files need to be copied to the directory of other zap daemons dir inside the session folder

Files copied to another daemon 2 directory
Utilizing the same session in active scan
Zap Daemon 2
Active scan for SQL completed

Note: Ports and directory of daemon 2 different then 1 still active scan performed smoothly

Approach 3: Optimization for Active Scan (Divide and Conquer)

Different active scan mechanisms

Suppose making chunks of SQL and XSS mechanisms into different daemons (Could be extended to other mechanisms as well check alert mappings below)

  • Active Scan can be distributed into multiple sub-attack mechanisms checks.
  • Also, we can do only specific security checks by defining a security policy name.

Parallel distributed Execution

Daemon 1
Daemon 2
Parallel execution
XSS scan
SQL scan completed
Adding specific ascan ids for SQL mechanism on line 75

Alert Code Mappings for other mechanisms can be found below website:

https://www.zaproxy.org/docs/alerts/

Approach 4: Increasing the number of threads

The number of threads could be increased and thus reduced the time

Zap ascan module has following functions
Using set_option_thread_per_host(self, integer, apikey=‘’)

Integrating the whole stack

Distributed Architecture
  1. Each worker VM has different daemons according to port, dir and API key.
  2. These daemons have divided work according to spider scan and active scan chunks.
  3. Once the spider scan is done its output is copied to the master machine.
  4. The active scan daemons would check for spider scan session files, if not present they'll pull it (scp/cp) from the master machine.
  5. Each new website's name is hashed and the unique hash becomes the session file name for easy handling of session files.
  6. Ray actors are used for managing the state of distributed computing on ray.
Executing combined approach

Conclusion:

Overall, it’s been an interesting and fulfilling experience, and I’m thankful for the chance to learn so much. I am grateful for the opportunity to learn about new technological fields such as distributed computing, cyber security, and cloud computing through this innovative initiative of scorelab(c2siorg) and my mentor Ravindu for selecting me to contribute to this project.

--

--