Simplifying Log Management

How to Export and Analyze Logs in Looker Studio for Better Insights

Leo Rocca
Globant
7 min readDec 20, 2023

--

Photo by author

This article will explore how to route logs to BigQuery and analyze them in Looker Studio. With this approach, we can simplify your log management and gain better insights.

BigQuery is Google Cloud Platform’s (GCP) data warehouse solution. BigQuery’s serverless architecture lets you use SQL queries with zero infrastructure management. BigQuery’s scalable, distributed analysis engine enables you to query terabytes in seconds and petabytes in minutes.

Looker Studio, formerly Google Data Studio, is an online tool for visualizing and analyzing data through different sources. It can connect to several databases, such as BigQuery or Cloud SQL.
With Looker Studio, we can create fully customized and easily shared dashboards.

Monitoring vs. Observability

There is a lot of talking about monitoring and observability. Sometimes, people mix the concepts used as being the same. But what is the difference between them? DevOps Research and Assessment (DORA) defines these two concepts:

  • Monitoring is tooling or a technical solution that allows teams to watch and understand the state of their systems. Monitoring is based on gathering predefined sets of metrics or logs.
  • Observability is tooling or a technical solution that allows teams to actively debug their system. Observability is based on exploring properties and patterns not defined in advance.

Solution Architecture

We will focus on log monitoring and how to get better insights. The architecture has two host projects, one per environment and three service projects to achieve this. Each service project has a Compute Instance deployed with an Ops Agent installed. It collects telemetry data from Compute Engine instances. It’s capable of collecting logs, metrics, and traces. Among the logging features, we can find standard system logs collection, logs over TCP protocol, and Forward protocol. The monitoring features include system metrics, such as CPU, disk, memory, and network. It also includes third-party applications, Prometheus, OpenTelemetry Protocol, and NVIDIA metrics.

We can install Ops Agent manually, one VM at a time. We can also do it automatically during VM creation or using automation tools like Terraform, Ansible, Chef, or Puppet.

The non-production logging project gathers the logs sent by the non-production service projects. The production logging project prj-prod only gathers the production logs. This gives us a correct separation of duties to access the collected data only to the right team members. The solution follows the resource hierarchy described in Figure 1, which has one folder per environment and another folder for all the logging and monitoring resources.

Note: To keep the exercise simple, we will create one folder for networking resources and another one for logging and monitoring resources.

Figure 1 — Resource hierarchy

Log Routing

In GCP, the logs can be routed from billing, organization, folders, and projects to different destinations. By default, each project has two sinks:

The _Requiredlog bucket stores:

  • Admin activity audit logs.
  • System event audit logs.
  • Access transparency logs.

The _Defaultlog bucket stores:

  • Data access audit logs.
  • Denied policies audit logs.

To learn more about GCP logs, you can check out the logging documentation.

You can also create a custom Log Sink to send logs to different destinations (see Figure 2). A log sink is used to define which logs will route, defining inclusion or exclusion filters. According to our architecture, we will create three sinks at the folder level. Each sink will route logs to the logging projects prj-log-nprod and prj-log-prod. This solution allows routing all logs and storing them in two different BigQuery datasets.

Figure 2 — How Cloud Logging routes and stores log entries

Cloud Logging routes logs to the default buckets. Creating a custom log sink allows us to choose other destinations, such as Cloud Storage, BigQuery, or Pub/Sub. We will use BigQuery as a destination to store logs. Once the logs have been stored in BigQuery, we can connect Looker Studio to analyze the data.

Network Architecture

As shown in Figure 3, the network architecture has three Shared VPCs, one per environment. The Shared VPCs have a host project and one or more attached service projects. This solution allows us to have a good separation of duties. Deploy the network resources (VPCs and subnets) in the host projects first. Then, deploy the workload (Compute Engine VM instances) on the service projects.

As mentioned in the previous section, the log sinks route the logs to the logging projects. The monitoring projects have the service projects attached to monitor them and collect metrics.

Figure 3 — Network architecture

Let’s Get Started

We need to create two BigQuery datasets to store logs from the non-production and production environments. Create the datasets for each project:

Non-production dataset:

gcloud config set prj-log-nprod        # Set into non-production project
bq mk --location=US -d ds_nonprod_logs # Create the non production dataset

Production dataset:

gcloud config set prj-log-prod       # Set into non-production project
bq mk --location=US -d ds_prod_logs # Create the production dataset

Note: We will keep the default values to keep the exercise simple.

Now, we will create the Logs Routers. We can do it from Cloud Logging -> Logs Explorer (See Figure 4). Log Explorer helps us to write a log filter and analyze the results before creating the sink:

Figure 4 — Cloud Logging — Logs Explorer

Furthermore, we can create a sink from Cloud Logging -> Logs Router. Here, we set up the sink name, destination, and the inclusion and exclusion filters.

Step-by-Step Log Sink Setup

To simplify the exercise, we will create a sink only for the development environment. First, go to Cloud Logging -> Logs Router and select the DEV folder in the project selector. Click the “Create Sink” button and follow the step-by-step wizard:

Figure 5 — Sink configuration summary

After we have set up the sink, we will see the configuration summary shown in Figure 5. When we choose BigQuery as the destination, we select the destination dataset. Because we will create a sink at the folder level, we have to select Other Resource and set up the dataset link. The dataset link has the following pattern:

bigquery.googleapis.com/projects/[PROJECT_ID]/dataset/[DATASET_ID]

Now, we have to decide if we will route the logs generated by the current resource or by the child resources. To achieve this, select Include child resources. Repeat these steps in the QA and PROD folders. The production environment logs must be routed to the production logging project prj-log-prod.

Once the routing of the logs starts, we will see a new table (See Figure 6) in the dataset created in the beginning. The log ingestion will take a few minutes to start. If you don’t see any data, wait a few minutes and refresh the view:

Figure 6 — Logs ingested in BigQuery

Exploring The Data

Now, we can explore and analyze the stored data. To do so, we will use Looker Studio without creating new tables or views in BigQuery. Data Studio can connect to different data sources to explore the data. We will keep all raw data in BigQuery and explore it in Looker Studio.

To open Looker Studio, click Explore Data and select Explore with Looker Studio. We will see a default dashboard with some data taken from BigQuery. On the right panel, we can see all the columns of the BigQuery table used as data sources. To add or update the data sources, go to Resource -> Manage added data sources menu (See Figure 7):

Figure 7 — Manage data sources

As shown in Figure 8, we can create a custom query by selecting the source project, dataset, and table:

Figure 8 — Looker Studio exploring data

At this point, we can add different types of charts, write new queries, or even filter data by date. That will make the dashboard update automatically (see Figure 9):

Figure 9 — Non-production environment example dashboard.

Summary

The solution presented in this article allows us to achieve better log management by filtering logs, not only by their content but also by the environment. This allows us to have a better separation of duties.

By combining BigQuery and Looker Studio capabilities, we have better analysis capabilities for our environments and projects.

Looker Studio also helps us share the dashboards with users who do not necessarily have a GCP account. This is a differentiator with Cloud Monitoring dashboards, which can also share dashboards, but with users who already have access to GCP and the monitoring projects.

References

--

--