Learning from Log4j 2 Vulnerability — Part 2

How to Detect & Investigate threats with BigQuery

Roy Arsan

Published in

Google Cloud - Community

6 min readFeb 9, 2022

Part 2 of this blog series. See Part 1 here.

Alright…we need to talk about learnings from Log4j 2 vulnerability and your cloud-native threat defenses.

In a previous article, I discussed how to protect your applications in Google Cloud with Load Balancer capabilities (in yellow checkmark below) and how to detect & alert on future Log4j exploit attempts with Cloud Logging (in green checkmark below). In this article, we’ll export and analyze the logs to BigQuery for advanced analytics, longer retention and cheaper storage.

*GCP Defense-in-Depth against exploits like Log4j: Prevent, Protect & Detect*

Make sure that you have enabled logging with full 100% sampling on all your internet-facing HTTP(S) Load Balancers. By now, you presumably patched your applications and enabled critical protections like Cloud Armor and Identity-Aware Proxy (if applicable) highlighted in the above diagram.

Using Cloud Logging and Cloud Monitoring, we are able to quickly detect threats and set up continuous alerting for future attempts. But what about more advanced use cases:

What if you need to do more in-depth analytics like filtering or correlating? For example, you’d want to filter out the Log4j scans that might be from good actors like Google Cloud’s own Web Security Scanner (part of Security Command Center highlighted in above diagram) which is flagging vulnerable resources and applications. You may also want to geo-locate the remaining filtered ones to find out where attacks are actually coming from, and more.
What if you need to do historical analysis beyond the last 30 days, which is the default log bucket retention? Analyzing logs over a longer window of time might uncover an attack chain that has started a while ago. It could also be required for compliance purposes.

That’s why, in this article, we’ll complement the ad-hoc analysis we did in Logs Explorer with BigQuery for deeper analytics.

First, follow these instructions to set up a native log sink from Cloud Logging to BigQuery. Make sure your log filter doesn’t exclude HTTPS Load Balancer logs which reside in requests log, that is logName="/projects/[PROJECT_ID]/logs/requests"

Note: See Log Scoping Tool for help generating a comprehensive log filter that includes not only HTTPS LB logs but also other security-relevant logs you may need like DNS queries, VPC flow logs and Firewall Rules logs. In this article, we’re going to analyze HTTPS LB logs only.

Search all Log4j exploit attempts

BigQuery log sink automatically routes logs (including HTTP(S) LB request logs) from Cloud Logging to BigQuery tables following this log to table name mapping convention. Since we configured the log sink to use partitioned tables, HTTP(S) LB logs land in one consolidated requests table. In other words, the destination table for all HTTP(S) LB logs is [LOG_SINK_BQ_PROJECT].[LOG_SINK_BQ_DATASET].requests, where LOG_SINK_BQ_PROJECT and LOG_SINK_BQ_DATASET are the Google Cloud project and BigQuery dataset you have specified when creating the log sink.

Let’s head to SQL Workspace in BigQuery console, and run the following query, after replacing LOG_SINK_BQ_PROJECT and LOG_SINK_BQ_DATASET with your respective values:

This will return all Log4j exploit attempts in the last 30 days. You should get the same log entries as the query you did in Logs Explorer (in Part 1), assuming you’re running over the same time window. In our case, we retrieve the same 33 log entries out of ~61.8k requests in about 0.5 sec:

Filter out good actors

As mentioned, these Log4j scans could either be from malicious or good actors. For example, you may already be using Web Security Scanner by Google Cloud. In that case you want to filter out the corresponding documented static IP address ranges. To do that, we can create a BigQuery User-Defined Function (UDF) in JavaScript to do some bitwise math and determine if a given IPv4 address is within a given CIDR range.

Run the following query to create a new function called Ip4AddressInRanges(). It iterates through each CIDR range and returns TRUE if the IPv4 input string belongs to any of the passed CIDR ranges:

You now have a BigQuery function available in your project. It takes an input IPv4 string, an array of one or more CDIR range strings, and returns TRUE or FALSE depending on whether that IPv4 address is in any of the ranges.

Creating new BigQuery User-Defined Function (UDF) named Ip4AddressInRanges()

Now, let’s go back to original query and add the following statement in your WHERE clause to filter out Web Security Scanner IP ranges, and/or any other legitimate IP ranges you may have:

Group by offending IPs

To identify potential attackers, let’s start by grouping these requests by remote IP address, and order by most active ones:

Query results should look like (IP addresses partially redacted):

Top offending IPs attempting Log4j exploits

Geolocate attacks

Now let’s geolocate where these attacks are actually coming from. Type in the following query to correlate the resulting list of offending IPs with location data found in GeoLite2 geolocation tables (details further described in this Google blog). Make sure to paste the query from the prior section as a subquery to populate log4j_exploit_attempts temporary table below with the top offending IP results.

Query results should look as follows, where country and city names are appended for each row with a geolocated IP address:

Top offending IPs attempting Log4j exploits enriched with geolocation data

As you can see, the geolocation information for all IP addresses, except for the first one, was found. GeoLite2 databases are free but less accurate than MaxMind’s GeoIp2 databases, and the post-processed GeoLite2 BigQuery table used here is a historical snapshot.

That said, as it turns out, the first IP address “[redacted].149” is located in Russia. Therefore, the majority of these exploit attempts are coming from either Russia or China, with some originating from the United States and Sweden. Fortunately, as you can see from the response status details, the vast majority of these are successfully blocked by our cloud defenses. Since we’ve configured our Cloud Armor security policy and our Identity-Aware Proxy access control, none of those attempts have reached our backend application.

Note: This BigQuery geocoding example is shown for illustrative purposes. You may want to use a more recent GeoLite2 or GeoIP2 post-processed BigQuery table, or other geocoding methods available in your own downstream visualization or BI tools like Looker or Tableau.

What’s Next?

You should be now more familiar with preventing, detecting and alerting on potential Log4j and other web vulnerability exploits using Google Cloud’s HTTPS Load Balancer logs. With the power of BigQuery, you’re able to quickly analyze and investigate these threats.

If you’re interested in this and other security or compliance questions, check out the community-supported Threat Detection As Code repo for Google Cloud, with more out-of-the-box threat detection queries you can use and extend. These foundational sample queries cover not only network activity (as shown in this blog) but also other categories like login patterns, keys management, provisioning activity, and data or workload usage.

Happy hunting!