Using Google Cloud Platform to Analyze Cloudflare Logs
We’re excited to announce that we now offer deep insights into your domain’s web traffic, working with Google Cloud Platform (GCP). While Cloudflare Enterprise customers always have had access to their logs, they previously had to rely on their own tools to process them, adding extra complexity and cost.
Cloudflare logs provide real time insight into traffic, malicious activity, attack incidents, and infrastructure health checks. The output is used to help customers adjust their settings, manage costs and resources, and plan for expansion.
Working with Google, we created an end-to-end solution that allows customers to retrieve Cloudflare access logs, store and process data in a simple way. GCP components such as Google Storage, Cloud Function, BigQuery and Data Studio come together to make this possible.
One of the biggest challenges of data analysis is to store and process large volume of data within a short time period while avoiding high costs. GCP Storage and BigQuery easily address these challenges.
Cloudflare customers can decide if they wish to obtain and process data from Cloudflare access logs on demand or on a regular basis. The full solution is described in this Knowledge Base article. Initial setup takes no more than 30 minutes to an hour. Moreover, customers can still replace any part of the process with their own tool or solution.
Below is a simple visualization of the data flow:
The key elements are:
Cloudflare logs are obtained via a REST API. Usually this service can be run on your local workstation or Virtual Machine. The illustrated solution uses GCP Compute micro-instance.
Log storage and management
For storing and managing log files we used GCP Storage bucket. All logs are stored in JSON format. Google Cloud Storage allows you to adjust the storage capacity when needed and set the retention policy.
Analyzing large data sets can be challenging. Google BigQuery makes it straightforward. When there is a new log file uploaded to the GCP Storage bucket, GCP Cloud Function triggers the process to import data from the new log file into BigQuery. BigQuery allows you to access your data almost immediately by running a simple query. As illustrated below you can, for example, pull top requested URIs with status code 404.
Based on feedback from our customers about which data they are interested in, we used GCP Data Studio to create visual reports. The following reports can be created in Data Studio using BigQuery as an input: top client IP address requests, requests by URL, error types, cached or uncached URLs, top triggered WAF rules, traffic types by device or location and many more.
Data Studio “Edit” mode
Data Studio “View” mode
Google Cloud is offering a $500 credit towards a new Google Cloud account to help you get started. In order to receive a credit, please follow these instructions.
Costs depend on several factors including the number of requests, storage, retention policy and number of queries in BigQuery, among others. For more pricing details, please use the GCP Pricing Calculator.
Please reach out to your Cloudflare Enterprise Solution Engineer or Customer Success Manager for more information.
Originally published at blog.cloudflare.com on October 26, 2017.