How to Enable Self Service on Google Cloud
Navigating Agility vs. Governance with Google Cloud’s Serverless Innovations
As a data consultant, I often find myself mediating a delicate balance: enabling business users and data teams to innovate and experiment with data, while addressing the very real concerns of central IT teams who are tasked with maintaining control and security. It’s a dance between agility and governance, and getting it right is critical for any organization serious about becoming truly data-driven.
IT Directors and CIOs are understandably hesitant to “hand over the keys.” They see the potential for uncontrolled data sprawl, security vulnerabilities, and inconsistent reporting if access is too broad. It’s a valid concern, rooted in a responsibility to protect the organization’s most valuable asset.
However, the flip side is equally impactful: when data teams and business users are stifled, unable to quickly test hypotheses or explore new data avenues, innovation grinds to a halt. This can lead to missed opportunities, slower decision-making, and a general disengagement from data initiatives.
So, how do we empower those who live and breathe data to experiment, while reassuring IT that the guardrails are still firmly in place? It’s about creating an environment that supports exploration without introducing undue risk, leveraging intelligent, serverless capabilities.
You may also like: All About Self-Service Analytics — Practical tips and essential strategies to overcome common pitfalls and build a sustainable self-service environments
The Foundation: Secure Sandboxes and IT Enablement
The core of enabling safe experimentation lies in providing isolated environments. Google Cloud projects serve as the perfect foundation for this. By creating isolated Google Cloud projects, IT can grant data teams access to dev environments that are separate from production, significantly reducing risk. These projects can then leverage serverless services like BigQuery for robust data warehousing and processing, minimizing IT overhead for provisioning. For highly sensitive or collaborative experiments, BigQuery Data Clean Rooms offer a managed, privacy-centric sandbox.
From an IT perspective, this shift means moving from a gatekeeper role to an enabler and auditor. This involves:
1. Partnering with Serverless Enablement
Proactively engage with data teams to understand their needs for exploration. Provide access to these isolated Google Cloud projects and the serverless capabilities within them.
2. Automating Governance, Lineage & Data Quality
This is where tools like Google Cloud’s Dataplex Universal Catalog shine. As a serverless data fabric, Dataplex automatically ingests new data sources, eliminating manual IT effort for catalog updates. Its robust data lineage feature tracks transformations directly from BigQuery (regardless of tools like dbt or orchestration via Google Cloud Composer for Apache Airflow workloads). Dataplex also features data profiling and quality capabilities that automatically suggest data quality rules, allowing IT to simply approve or reject them. This massively accelerates making data trustworthy for experimentation.
3. Focusing on Guardrails, Not Gates
Define clear, actionable policies and standards that guide experimentation. Think of these as guardrails that keep everyone on the right track.
Tools for Intelligent Exploration and Governed Self-Service
With the foundation of secure sandboxes and automated governance in place, data teams can leverage powerful tools for intelligent exploration and self-service analytics:
Looker for Governed Self-Service, Semantic Modeling & Version Control
Looker, with its LookML (Looker Modeling Language), is central to this. LookML is a domain-specific language used within Looker to define KPIs and create a governed semantic layer. It defines core objects like models, Explores, views, and fields, dictating how data is interpreted and presented. Crucially, Looker’s LookML development is built with version control in mind, allowing for Git integration. This means teams can track changes, revert to previous versions of KPIs and models, and collaborate effectively on the semantic layer.
Here is an example of LookML, demonstrating a simple model with a view and an explore.
# model_file.model
connection: "your_database_connection" # Replace with your database connection name
include: "*.view" # Includes all view files in the project
explore: users {
join: orders {
type: left_outer
sql_on: ${users.id} = ${orders.user_id} ;;
relationship: one_to_many
}
}# users.view
view: users {
sql_table_name: public.users ;; # Replace with your actual table name
dimension: id {
primary_key: yes
type: number
sql: ${TABLE}.id ;;
}
dimension: first_name {
type: string
sql: ${TABLE}.first_name ;;
}
dimension: last_name {
type: string
sql: ${TABLE}.last_name ;;
}
dimension: email {
type: string
sql: ${TABLE}.email ;;
}
dimension: full_name {
type: string
sql: CONCAT(${first_name}, ' ', ${last_name}) ;;
}
measure: user_count {
type: count
drill_fields: [id, first_name, last_name, email]
}
}# orders.view
view: orders {
sql_table_name: public.orders ;; # Replace with your actual table name
dimension: id {
primary_key: yes
type: number
sql: ${TABLE}.id ;;
}
dimension: user_id {
type: number
sql: ${TABLE}.user_id ;;
}
dimension: order_date {
type: date
sql: ${TABLE}.order_date ;;
}
measure: total_orders {
type: count
}
measure: total_revenue {
type: sum
sql: ${TABLE}.order_total ;;
}
}model_file.model:
- This file defines the
exploreforusers, which serves as a starting point for querying data. It also includes ajointo theordersview, establishing a relationship between users and their orders.
users.view:
- This file defines the
usersview, mapping to apublic.userstable in the database. It includes variousdimensionslikeid,first_name,last_name,email, and aderived dimensionfull_namecreated by concatenatingfirst_nameandlast_name. It also includes ameasureuser_count.
orders.view:
- This file defines the
ordersview, mapping to apublic.orderstable. It includesdimensionslikeid,user_id,order_date, andmeasuresfortotal_ordersandtotal_revenue.
This example demonstrates how LookML defines the structure of your data for analysis within Looker, including database connections, table mappings, dimension and measure definitions, and relationships between data entities.
The significant advantage here is that the compute for these models is handled by BigQuery. This means Looker acts as a smart pass-through, leveraging BigQuery’s speed and scale without replicating or moving data, thereby maintaining governance and efficiency.
This empowers data teams for self-service analysis and experimentation directly on governed data, reducing reliance on IT for every report or metric definition.
BigQuery ML for Seamless Advanced Analytics
For more advanced use cases, leveraging BigQuery ML (BQML) directly within the data warehouse offers a powerful way to experiment with advanced analytics and ML models. It allows data teams to build and deploy models using familiar SQL syntax, right within their BigQuery sandbox. This means they can rapidly iterate on hypotheses and explore model performance without the overhead of waiting for the evaluation of the “correct” size of virtual machines or managing separate notebook environments. It keeps the experimentation close to the data, accelerating the innovation cycle.
Imagine this: Instead of weeks spent setting up infrastructure, requesting compute resources, and wrestling with compatibility issues, your data analysts can now write a few lines of SQL to predict customer churn, forecast sales, or detect anomalies, all before their coffee gets cold. BigQuery ML transforms the ML journey from a complex, resource-intensive project into an agile, iterative process that can be tackled by anyone fluent in SQL, unlocking predictive power across the entire organization.
This is an example for a linear regression model using BigQueryML:
CREATE OR REPLACE MODEL `your_project_id.your_dataset.penguin_weight_prediction_model`
OPTIONS(model_type='LINEAR_REG', input_label_cols=['body_mass_g']) AS
SELECT
species,
island,
bill_length_mm,
bill_depth_mm,
flipper_length_mm,
sex,
body_mass_g
FROM
`bigquery-public-data.ml_datasets.penguins`;The Path Forward: Collaboration and Trust
The fear of decentralization is understandable, but it often stems from a lack of clear processes and communication. By establishing these “safe sandboxes” within Google Cloud projects that leverage serverless BigQuery and Dataplex, and collaborating on clear guidelines, IT can empower its data teams and business users to innovate freely, while still maintaining the essential oversight needed to protect the organization.
It’s about shifting from a model of IT-as-gatekeeper to IT-as-enabler and trusted partner in innovation, all powered by intelligent, serverless technologies.
In a coming article, I’ll dive deeper into the next wave of empowerment: a suite of Gemini-powered features in BigQuery Studio, including Data Insights, the Data Canvas, and Data Preparations. These tools are set to further revolutionize the data-to-insight journey by solving the ‘cold-start’ problem and intelligently automating data transformation.
What are your experiences with enabling data exploration and experimentation? How have you navigated the balance between IT control and organizational agility, particularly with serverless and automated tools? Let me know in the comments!
Thank You For Reading. How About Another Article?
Let’s help Gandalf by hitting 50 times that 👏 button! Hurry up!
- Follow for more educational content and stuff!

