Looker — self-hosted installation on GCP

Domenica Siviglia
Google Cloud - Community
5 min readFeb 21, 2022

Looker is Google Cloud’s cloud-native Enterprise BI Platform enabling access to near real-time data when and where you need it.

You can choose between Looker hosted solution where Looker manages all the components, and a customer hosted solution where you own your environment. The choice between the two solutions depends on the requirements. For example you can have security requirements that do not allow you to use the Looker hosted one. With Looker hosted solution you do not have to worry about installation, upgrades and scaling and also the time to solve issues is faster. With a customer hosted solution, you have complete control on your infrastructure.

In this article I will describe the steps necessary to install Looker on GCP.

It is possible to install Looker on a single VM but this is acceptable just for small workload databases. A better solution for a production environment is a cluster of VMs as shown in the picture below.

Cluster of VMs

Running Looker as a cluster of instances across multiple VMs is a flexible pattern that benefits from service failover and redundancy. Horizontal scalability affords increased throughput without running into excessive garbage collection costs.

Cluster Considerations

  • OS and Distribution

Looker runs on the most common versions of Linux: RedHat, SUSE, and Debian/Ubuntu. GCP distributions of Linux are compatible with Looker. Debian/Ubuntu is the most heavily used Linux variety. Looker runs in the Java virtual machine (JVM). When choosing a distribution, check if the versions of the OpenJDK 8 are up to date.

  • CPU and Memory

For production use 16x64 nodes (16 CPUs and 64 GB of RAM), a good balance between price and performance. Configure more than 64 GB of RAM impact performance, because garbage collection events are single threaded and halt all other threads For configurations with up to 50 users, Looker recommends running a single server

  • Disk storage

100 GB of disk space is typically sufficient for a production system.

  • Capacity

When more capacity is required you should add 16x64 nodes to the cluster rather than increase the size of the nodes beyond 16x64.

  • File System

Looker nodes need to share certain parts of the filesystem (LookML models, Looker models developers, Git server connectivity). The file system must be POSIX compliant.

  • Database

Looker’s metadata needs to be centralized, so its internal database must be migrated to Cloud SQL (MySQL).

Looker needs a Git service to provide version management of the LookML files. GitHub, GitLab, BitBucket and others are supported.

Scalability

You can specify the number of nodes in the cluster in the instance group configuration. For the moment, there is no clean way for Looker to terminate a node gracefully.

You can autoscale up, but scaling down should be done manually and very carefully.

- Remove the node from the Load Balancer directory

- Wait until the user starts a new session (typically 15, 20 minutes)

- Take the node offline

Upgrading

  • Create a new Looker image
  • Always keep in mind the most important rule of upgrading VMs: you can never have two versions of VM-based Looker connected to the same database. This will corrupt the database and render it unusable.
  • The quickest way to safely proceed is to “scale” our instance group to 0 nodes. Make the change in the Edit modal for your instance group.
  • Backup your DB
  • Recreate the instance group with the new image

Network

Looker listens for HTTPS requests on port 9999. Looker uses a self-signed certificate with a common name of self-signed.looker.com.

The Looker API listens on port 19999.

Internal database connection

Private service access must be enabled in order to connect to Cloud SQL from a Compute Engine instance using private IP. Your VM instance must be in the same region as your Cloud SQL instance.

External services

Looker’s telemetry and license servers are available on the public internet via HTTPS. Traffic from a Looker node to ping.looker.com:443 and license.looker.com:443 must be allowed.

SMTP services

By default, Looker sends outgoing mail via SendGrid. That may require adding smtp.sendgrid.net:587 to an allowlist.

The cluster nodes will communicate with each other through a message broker service, which uses ports 1551 and 61616. Ports 1551 and 61616 must be opened between cluster nodes.

Database

Recommendation is to use a remote MySQL database (in Google Cloud, use Cloud SQL ).

MONITORING

It is possible to use Cloud Monitoring or JMX.

JMX

JMX is not enabled by default. To enabled it the startup script needs to be modified

https://docs.looker.com/setup-and-management/on-prem-install/monitoring-instance

Cloud Monitoring

It is suggested to collect, graph and alert on at least the following performance metrics:

  • CPU Utilization: load and percent CPU utilized
  • Memory Utilization: total used and swap used
  • Disk Usage

LOGGING

It is possible to configure in the startup script logging options such as where the log files are stored, the level and the log format.

SECURITY

  • Looker uses a secure connection to query Cloud SQL and Cloud Filestore.
  • Administrators can set granular permissions by user or group and can restrict access
  • All data is encrypted at rest.
  • Allow use of your MySQL user account only to the IP address used by your Looker server.

IAC

You can use Terraform to install Looker on GCP. The following modules are needed:

  • Database module
  • Filestore module
  • DNS module
  • Hosted Zone Module
  • Instance Group module
  • Load Balancer module
  • Secrets Module
  • SSl certificate module

References

--

--