Cribl Stream: The Key to Seamlessly Transmitting WAF Data to Amazon Security Lake

Danny Woo
Cloud Villains
Published in
6 min readSep 18, 2023

Amazon Security Lake, introduced this year, enables security teams to store data from different services and solutions and create comprehensive monitoring using well-known dashboard tools like QuickSight, OpenSearch, or any other tool you prefer.

Speaking of security data, AWS WAF plays a crucial role in protecting web applications from common web attacks and malicious traffic. It’s essential not to miss out on recording this data. Fortunately, Cribl Stream provides various AWS connectors that allow you to seamlessly integrate with AWS services, making it possible to transform and send your data to Amazon Security Lake.

Introduction to Cribl Stream

[ Visualized Architecture Diagram of Cribl Stream and Cribl Edge ]

Cribl Stream is a vendor-agnostic solution that offers flexibility, as it can be installed on-premise or used as SaaS. It allows you to capture data from various sources, including cloud services, third-party solutions, and even internal logs where Cribl Stream is deployed. You can then parse, reduce, mask, or apply required transformations to the data before sending it to popular destinations like S3, Elasticsearch, Prometheus, or any other platform.

Here are the reasons why Cribl Stream was chosen as a data pipeline solution to send WAF data to Amazon Security Lake:

  1. Cribl has been officially recognized as a partner on the AWS homepage
  2. Building on the 1st reason, Cribl Stream features an Amazon Security Lake Connector for seamless integration with the service
  3. Cribl Stream can also map the data into. Open Cyber Security Format (OCSF) and send it automatically in a Parquet file
    (Storing data in the Parquet format is essential when sending it to ASL)
  4. GUI offered by Cribl Stream adds to its appeal, making the management of these tasks even more intuitively

Test Process

If I were to simplify the architecture for the test, it would look like this :

[ The Architecture of Cribl Stream for the test ]
  1. Prepare AWS Services : Set up AWS services to store WAF data in S3, install Cribl Stream on an EC2 instance, and configure a custom source in Security Lake
  2. Use S3 Source Connector : Utilize the S3 Connector to pull the data
  3. Parse and Reduce Data: Parse the data into OCSF format and reduce unnecessary data as needed
  4. Route data : Finally, transmit the processed data to Security Lake

Test Environment
- Ec2 Instance Type : t3.medium
- OS : Amazon Linux v2
- Data Streaming Solution : Cribl Stream v4.1.3

1. Install Cribl Stream

To install Cribl Stream in single-instance mode, you can either visit https://cribl.io/download/ to download the installation package, or you can access my GitHub repository to download a script that deploys Cribl Stream, configuring it to start and set up the necessary path variables automatically.

Once the installation is complete, you can immediately access Cribl Stream at the default port 9000.

[ Cribl Stream login page ]

2. Set Up Source

After logging in to Cribl Stream, navigate to “Data” > “Sources” in the top menu, and select “S3” connector in the “Pull” section

[ Cribl Stream Source Connectors ]

Click “Add Source” in the upper right corner to create a new connector and configure its settings. In the option, enter the name, URL, or ARN of the SQS queue in the “Queue” to read events from. Your newly created connector will appear in the list.

Note : Cribl Stream supports receiving data from Amazon S3 buckets, using event notifications through SQS

[ Configuration of Cribl Stream Source Connector ]

You can check the connection health by navigating to “Status” or “Live Data” in the connector’s tab menu.

3. Set Up Destination

To transmit the data to the destination, navigate to “Data’’ > ‘‘Destinations’’ in the top menu and select the Amazon Security Lake connector.

[ Cribl Stream Destintaion Connectors ]

By clicking ‘‘Add Destination’’ on the upper right corner, you can create a new connector and configure the necessary settings for the connector. As key configuration options, provide the ‘S3 bucket name’, ‘region’, ‘AccountID’, ‘Customer source name’, and ‘Assume role.’ Afterward, you can observe the connector you created being organized in the form of a list.

[ Configuration of Cribl Stream Destination Connector ]

Before being transmitted to Amazon Security Lake, the data needs to be converted to Parquet format. To achieve this, the schemas can be managed within Cribl.

[ OCSF Schema in the Cribl Knowledge menu, where all schema and lookup information are stored ]

Guide to OCSF Schema

Similar to the source connector, you can verify the connection health by going to “Status” or “Test” in the connector’s menu and sending test data to the destination.

4. Transform Data

Before the data is transmitted to Amazon Security Lake, it’s necessary to transform the data to align it with the required fields. Pre-conversion debugging can be performed within Cribl.

[ Transform the data before sending it to the desired destination ]

The key functions below will be essential before being forwarded to Amazon Security Lake, and within Cribl, these tasks can be performed code-lessly.

Parser, Eval, Drop, Rename, Serialize

  • Code-less vs Code-yes
    To send the data to ASL (Amazon Security Lake), the data schema needs to be mapped to OCSF (Open Cyber Security Format), which can be managed with Cribl. One challenge I encountered was handling fields like ‘httpRequest_headers,’ which had an inconsistent number of array values, making consistent mapping difficult. Thankfully, Cribl allows you to create custom functions, known as the ‘Code’ function. I wrote code to read a variable number of array values and consolidate them into a single field. Alternatively, there’s an ‘Eval’ function for specifying the maximum number of array values and assigning ‘null’ when there isn’t any data available. Cribl offers both codeless and code options for data transformation, allowing you to choose from built-in and customized functions.
[ Customize transformation function using Code ]

5. Route Data

After configuring your Source and Destination connectors, connect them by making a data pipeline by going to “Routing” > “Data Routes”. Specify the data to consume from the Source Connector in the ‘Filter’ section using a JS Expression. Then, select your previously created functions in the ‘Pipeline’ option and choose the Destination Connector in the ‘Output’ option.

[ Route data and manage the pipeline in the list ]

After building the data pipeline, verify that the data is successfully stored.

[ Data is stored in S3 ]

Conclusion

In this blog, I’ve demonstrated how to establish a data pipeline between AWS and Amazon Security Lake using Cribl Stream. Cribl Stream’s Amazon Security Lake connectors and its management of OCSF format significantly simplify the process of building a data pipeline for Security Lake.

To learn more about sending data from third-party solutions to ASL, please visit the Cribl blog. Additionally, for insights into sending system logs to ASL with Cribl Edge, refer to my other blog post.

If you have any questions or need further information, don’t hesitate to contact us at cribl@megazone.com or leave your comments below.
Thank you for reading!

#MegazoneCloud #AWS #cloud #AmazonSecurityLake #Cribl #CriblStream #Cribl.Cloud

--

--