Google Cloud Platform Security Operations Center Data Lake

Lions and tigers and bears, oh my

Google Cloud Platform brings a vast array of security monitoring facilities to its various platforms, and provides tools including Stackdriver and Cloud Security Command Center to monitor them. In addition, there’s a rich partner ecosystem, and many customers have already implemented security monitoring policy in their Splunk implementation.

This solutions article provides a great overview of how to build a data lake on Google Cloud Platform. The following diagram shows the relationship between Google Cloud Platform security sources and sinks, to assist you in mapping the data lake guidance to a Security Operations Center (SOC) scenario.

Security Operations Center data lake architecture; related sources and sinks are the same color.

Data lake components

Google Stackdriver

Stackdriver aggregates metrics, logs, and events from infrastructure, giving developers and operators a rich set of observable signals that speed root-cause analysis and reduce mean time to resolution (MTTR).

It provides native integration with cloud data tools like BigQuery, Cloud Pub/Sub, Cloud Storage, Cloud Datalab, and out-of-the-box integration with tools like Splunk Enterprise.

You can filter which logs to exclude by organization, folder, project, and billing id.

You can enable Data Access logs at the organization, folder, or project level (other logs are enabled by default).

  • You can specify the services whose audit logs you want to receive. For example, you might want audit logs from Compute Engine but not from Cloud SQL.

Google Cloud Security Command Center

Cloud Security Command Center gives enterprises consolidated visibility into their cloud assets across App Engine, Compute Engine, Kubernetes Engine, Cloud Storage, Datastore, Spanner, Cloud DNS, Service accounts and Google Container Registry.

Cloud Security Command Center integrates with Google Cloud Platform security tools like Cloud Security Scanner, and the Cloud Data Loss Prevention (DLP) API.

It also integrates with third-party security solutions such as Acqua, Cavirin, Cloudflare, CrowdStrike, Dome9, Palo Alto Networks RedLock, Qualys, and Twistlock, and provides an API and schema to integrate additional third party tools.

Google Cloud Dataflow

Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness — no more complex workarounds or compromises needed.

Use Cloud Dataflow as a convenient integration point to bring predictive analytics to security event management by adding TensorFlow-based Cloud Machine Learning models and APIs to your data processing pipelines.

Google BigQuery

BigQuery allows organizations to capture and analyze security data in real time using its powerful streaming ingestion capability so that your insights are always current. It gives you full view of all your data by seamlessly querying data stored in BigQuery’s managed columnar storage, Cloud Storage, Cloud Bigtable, Sheets, and Drive.

It enables you to analyze all your security operations data, build and operationalize machine learning solutions with simple SQL, and easily and securely share insights within your organization and beyond as datasets, queries, spreadsheets, and reports. It…

  • Integrates with existing ETL tools like Informatica and Talend to enrich the data you already use.
  • Supports popular BI tools like Tableau, MicroStrategy, Looker, and Data Studio out of the box, so anyone can easily create reports and dashboards.

BigQuery ML (beta) enables users to create and execute machine learning models using standard SQL queries; it also increases development speed by eliminating the need to move data. It supports the following types of models:

  • Linear regression — These models can be used for predicting a numerical value.
  • Binary logistic regression — These models can be used for predicting one of two classes (such as identifying whether an event represents a security threat).
  • Multiclass logistic regression for classification — These models can be used to predict more than two classes such as whether an input represents a low, medium, or high impact threat.

Google Cloud Storage

Google Cloud Storage allows world-wide storage and retrieval of any amount of data at any time.

Supported data sources include Cloud Pub/Sub, Stackdriver Logging, Dataflow, and BigQuery; BigQuery can also import from Google Cloud Storage.

Object Lifecycle Management provides the ability to set the object storage class (eg. Nearline, Coldline) to a lower-cost class for less frequently accessed objects, as well as delete objects, based on

  • Object age
  • Date
  • Number of versions


On a side-note, organizations which aren’t cloud native have some of their networking infrastructure on premise, and may want to take a look at Alphabet spin-off Chronicle, which is building a cybersecurity intelligence platform that can help organizations better manage and understand their own data. Chronicle is aiming to unlock valuable hidden insights by making it faster and easier to analyze data, and to look for patterns across sources and over time.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store