Unlocking the Journey of Alert Similarity with Snowflake Cortex Search

Photo Credit : Unsplash

In today’s rapidly evolving digital landscape, organizations managing large data volumes face significant challenges in ensuring data security and tracking similar alerts efficiently. With cyberattacks becoming increasingly sophisticated and prevalent, the ability to effectively detect and respond to security incidents has never been more critical. Traditional cybersecurity approaches often rely on siloed alert systems that generate a multitude of alerts, making it challenging for security teams to identify and prioritize threats efficiently.

To address these challenges, forward-thinking companies are utilizing cutting-edge search techniques to enhance their security operations. These include securely storing and accessing data in a Security Data Lake, transforming data with ready-to-use features, extracting insights through Predictive Intelligence and Generative AI Content, and monitoring security alerts in real-time on a large scale. By leveraging advanced capabilities provided by platforms like Snowflake, organizations can achieve centralized governance and manageability.

Snowflake empowers its customers to build data-driven security programs that apply advanced analytics across activity logs, asset inventory, configuration data, and visibility metrics, aligning with threat detection standards. With Snowflake Cortex Search, organizations can ensure that their security analysts remain fully engaged and motivated, mitigating the risk of burnout while consistently delivering high-quality service.

In this blog, we will explore how the concept of alert similarity analysis can be rapidly and efficiently applied at scale using Snowflake’s search engine, Cortex Search. Join us on this journey as we uncover the power of alert similarity analysis in bolstering cybersecurity defenses and safeguarding digital assets against evolving threats.

Figure 1 : Conventional approaches adopting clustering and classification

Let’s imagine a scenario where a company intends to develop a feature that assists their security analysts for analyzing and identifying error logs and group similar ones. They have 1B+ rows and want to perform this task at scale and efficiency. Given the analysis task, latency can be multiple seconds. With Snowflake Cortex Search organizations can ensure that their security analysts remain fully engaged and motivated, thereby mitigating the risk of burnout, while consistently delivering the high level of service our customers have come to rely on.

A conventional approach to a similarity analysis solution would typically involve:

  • Sifting through a vast amount of threat intelligence to select reliable and comprehensive security incidents.
  • Extract tactics, techniques, and procedural information from these incidents.
  • Use the TF-IDF algorithm to calculate the weight of keywords within attack descriptions.
  • Based on these weights, we use the cosine similarity algorithm to compare the similarity between attack events and construct an n:n similarity matrix to compare all alerts.
  • Reviewing the results to identify similarities.

This method helps organizations uncovers crucial technical and tactical information used by the attackers, giving researchers a deeper understanding of the attackers’ behavior patterns.

An hypothetical company SpiralData leverages Snowflake as their Security Data Lake and has billons of logs stored in the security data lake. SpiralData was considering to leverage a separate database that provides. tokenization, vector generation and indexing, and Elasticsearch for distributed, RESTful search and analytics.

This results in the creation and maintenance of a siloed technological stack, cumbersome data movement, fragmented security postures, and increased maintenance overhead

Thanks to Snowflake that released Snowflake Cortex Search!

Snowflake Cortex Search is a hybrid search engine leveraging a fusion of vector embeddings for semantic similarity plus keyword search for lexical similarity, achieving state-of-the-art retrieval quality. It is fully managed, secure and governed within the customer’s Snowflake account and exposed via a REST API with no operational overhead.

Figure 2 : End to End Alert Similarity Analysis Pipeline leveraging Cortex Search Engine

Snowflake Cortex Search enables Security Analysts to derive alert similarity with :

  • A single SQL command and no data movement with a blend of vector embedding and keyword-based search.
  • Constantly updated index that manages vector embeddings and adapts to changes in the source data.
  • Leverage traditional information retrieval (IR) via keyword indices, vector retrieval, and hybrids of the two (single stage filtering).
  • The state-of-the-art retrieval and ranking methods, ready for use requiring only minor adjustments.
  • Declarative definition of service, using search columns and metadata filters from your existing data.
  • Utilize the Query API with natural language queries for a more intuitive semantic search. It’s available via Python and REST.
  • Operate with confidence within the Snowflake perimeter, governed by current access control policies.

Implementation

Given below is a snapshot of logs from SpiralData data store. First lets extract specific features like the time of incident and the alert type. A sample of the normalized synthetic data is shown below:

Figure 3: Synthetic alerts

Tokenization, Vectorization, Indexing and Retrieval in one simple step using Snowflake Cortex Search

Cortex Search employs a combination of retrieval models to deliver superior search quality with minimal need for adjustment.

  • Vector search for semantic resemblance
  • Keyword search for lexical resemblance

The results are combined and reordered, ensuring only the most relevant documents from both searches are returned. This dual retrieval approach sustains high quality across a wide spectrum of datasets and queries.

As source entities for this query change overtime, the query results will be updated and changes will propagate to the retrieval service automatically. If the query begins failing (i.e. one of the underlying tables was deleted), the last successfully built index will continue to be served. The retrieval service will be queried via a REST API and will provide results in a documented schema encoded in JSON.

Hint : Explore Snowflake notebook with ML Runtime if you havent already! Pretty cool.

Creation of Search Service

Create a Cortex search service declaratively with a simple SQL query.

stmt=f"CREATE OR REPLACE CORTEX SEARCH SERVICE alert_search_service\
ON alert_entry\
ATTRIBUTES TIMEOFINCIDENT\
WAREHOUSE = tasty_de_wh\
TARGET_LAG = '1 hour'\
AS (SELECT alert_entry,TIMEOFINCIDENT FROM raw_db.public.ALERTS_DATA)"
session.sql(stmt).collect()

[Row(status=’Cortex search service ALERT_SEARCH_SERVICE successfully created.’)]

The TARGET_LAG specifies the maximum amount of time that the Cortex Search service content should lag behind updates to the base tables specified in the source query.

Snowflake performs transformations on your source data to get it ready for low-latency serving when the search service is created. Depending on the dataset and the warehouse size, it can take time for Cortex to complete creating the service. Once the service and index are created, access to schema and the database can be granted to required roles.

Query the Search Service

Query the service via either the Python API or REST API. The following code shows using the Python API to retrieving the logs most relevant to a query about user logins, filtered to return results in the given timestamp.

alert_search_service = (root
.databases["raw_db"]
.schemas["PUBLIC"]
.cortex_search_services["alert_search_service"]
)

resp_userlogin = alert_search_service.search(
query="User Login",
columns=["alert_entry","timeofincident"],
filter={
"@or": [
{
"@or": [
{"@eq": {"TIMEOFINCIDENT": "2021-12-13T00:45:00"}},
{"@eq": {"TIMEOFINCIDENT": "2021-12-13T00:50:00"}}
]
}
]
},
limit=100
)
print(resp_userlogin.to_json())
{"results": [{"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user michael"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user michael"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user michael"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user michael"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user david"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user ryan"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user ryan"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user ryan"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user ryan"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user ryan"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user john"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user john"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user john"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user john"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user john"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user lily"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user lily"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user lily"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user lily"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user sebastian"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user sarah"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user william"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user william"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user daniel"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user daniel"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user daniel"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user daniel"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user daniel"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user daniel"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user daniel"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user zoe"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user zoe"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user zoe"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user hunter"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user mason"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user mason"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user mason"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user mason"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user mason"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user mason"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user stella"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user stella"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user stella"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user stella"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user stella"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user stella"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user stella"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user eli"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user eli"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user eli"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user eli"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user jacob"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user jacob"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user jacob"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user jacob"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user jacob"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user leo"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user leo"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user leo"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user leo"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user leo"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user leo"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user leo"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user benjamin"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user benjamin"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user benjamin"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user benjamin"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user benjamin"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user benjamin"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user benjamin"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user benjamin"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user luke"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user luke"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user luke"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user luke"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user luke"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user luke"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user luke"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user adam"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user adam"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user adam"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user gabriel"}, {"timeofincident": "2021-12-13T00:45:00", "alert_entry": "User login from new location New York by user gabriel"}], "request_id": "c4aabc8b-3e2c-411a-bbe8-030746a9c44a"}

The sample query response is shown above. Cortex Search also exposes a REST API endpoint in the suite of Snowflake REST APIs, which is in Private Preview at the time of this writing.

Now the total number of alerts and the number of similar alerts matching the given query can be found easily using the following by directly querying the query response and the alerts data.

results = resp_userlogin.results

# Total number of alerts
total_alerts = sp_df.count()

# Number of similar alerts
similar_alerts = len(results)

The percentage of similar alerts is calculated and this can completes the alert similarity journey. A Security analyst can take immediate action based on the observations and criticality.

# Calculate percentage
percentage_similar_alerts = (similar_alerts / total_alerts) * 100

print(f"Percentage of similar alerts from total alerts: {percentage_similar_alerts:.2f}%")

Upon assessing the similarity between all alerts within a specific time window, we ascertained that 40–50% of alerts had resemblance to other alerts. These similarities were not merely superficial, but pointed to shared characteristics or patterns that could be indicative of related security events or threats.

Savings and Implications

The significance of these findings goes beyond just understanding the nature of the alerts. They also have practical implications for our security analysts. With the knowledge of which alerts are similar, our analysts can leverage these insights to streamline and expedite their response efforts. This understanding can be particularly useful in scenarios where there are large volumes of alerts to be processed, and time efficiency becomes crucial.

Moreover, we were also able to use these metrics to estimate potential time savings for our analysts. This aspect is particularly important as it directly relates to the operational efficiency of our security measures. By identifying similar alerts, analysts can potentially reduce the time spent on each alert, overall enhancing the efficiency of our security operations.

Figure 4 Similar Alerts grouped with neighbours

Now that the discovery and identifications is complete , the security analyst may further expand on this and connect this to a RAG based application built using LLM functions from Snowflake Cortex to expedite the troubleshooting process by reviewing how similar alerts were resolved previously.

Conclusion

Security lies at the core of every organization’s operations. It is an integral component that ensures the protection of sensitive data and maintains the trustworthiness of the organization’s systems. Without robust security measures in place, an organization’s reputation, operational efficiency, and overall success could be at risk. Whether you collect highly sensitive data as part of your everyday business or you are managing on behalf of your customers.

In this blog we have delved into the benefits of adopting alert similarity analysis, discussed a real-world use case, and provided practical insights for implementing this approach effectively using state of the art search engine Cortex Search.

*Snowflake Cortex Search is in Public Preview at the time of this writing

Keep innovating!

--

--