A day in the life of a Google Cloud data user

Peter Billen
Google Cloud - Community
8 min readSep 4, 2023

In this article, I take a closer look at a ‘day in the life’ of a Google Cloud data user. Organizations are becoming more data driven than ever before, but not all are equipped with a modern data platform. Looking to get started or grow their first use cases, many are figuring out what strategy to take, which investments to make and how to best prepare for the future. Understanding how a data user — engineer, analyst or business user — interacts with such a platform is a question that often returns.

Disclaimer: I work at Google in the cloud team. Opinions are my own and not the views of my current employer.

No matter the use case, everything starts with data. Organizations that are looking to kick off or reinvent their approach for getting value from data need to look at breaking down silos. Data is typically scattered across applications, locations, teams, … (sometimes grown historically); making it available in one unified platform makes it possible to

  • Use data without moving it
  • Create a common understanding and avoid duplication of transformations
  • Share data and insights easily across the organization (and even externally with customers, partners and industry stakeholders)

But making data available in a unified platform also means that it can be used in different ways enabling actionable insights:

  • Both batch and realtime
  • From traditional reporting to advanced machine learning
  • Structured as well as unstructured data

Delivering innovation and faster time to value with data, different stakeholders require access using the most appropriate tool. While business users explore data using graphical Business Intelligence tools (either pre-defined reports or self-service exploration), data analysts love to use SQL to find interesting insights; with engineers typically prefer using their coding language of choice.

Google Cloud’s unified data and AI/ML platform is a flexible, open, and secure data analytics platform that provides an easy path to becoming an intelligence-driven organization. You can start a data journey at your pace, knowing that while you grow it is possible to expand when needed, whether you require scaling the platform as its usage grows or adding new capabilities to unlock additional value (for instance realtime predictions or machine learning driven assistance).

There are a wide range of services for running these data and analytics workloads and this can result ingoing through a lot of information. A decision tree can simplify the decision-making process with respect to these services, yet it remains overwhelming figuring out what it would mean for your data organization. Concretely, I ofter get asked how data engineers and analysts will use the platform getting their work done. The perspectives below focus on two situations and aim to provide an illustration of how this could look like:

  • Analyzing new data sources
  • Analyzing processed data

Note that these perspectives do not provide an exhaustive analysis covering all possible scenarios, rather they act as a basis for inspiration and support in getting started. Once underway, teams can leverage their experiences, collaborate, document best practices and grow over time. It is important to aim for progression rather than perfection: start small and continuously improve, but make sure to establish the necessary foundations early and automate tasks wherever possible. Next to saving time and effort, automating tasks will enable your foundations to get extended easily over time.

A day in the life analyzing new data sources

While building or growing a data platform, new data sources need to be analyzed efficiently. What data is available? How is it structured? What does the Data Quality look like? What transformations will be required? Etc.

Depending on the type of data (structured or unstructured), the amount and the purpose, the approach and tools might differ.

Analyzing new data sources

Let’s consider a new data source to be on boarded. First, a dataset (or sample) can be transferred to GCP where Cloud Storage can be used for unstructured data and BigQuery for structured data. To prepare the analysis, this could be as simple as copying or loading data fast via the Cloud Console. For larger initiatives or repetitive work, the team can automate or use more elaborate data transfer techniques (for instance using an ETL or a Change Data Capture tool).

Once the data is on GCP, it can be analyzed, transformed, shared, etc.

  • Dataprep and Data Fusion can be used for data wrangling. Both offer a code-free user interface allowing less technical users to get up and running fast.
  • BigQuery can be used for analyzing data using SQL code, from simple to complex queries.
  • More technical users can make use of the Vertex AI platform to run Python notebooks or AutoML (no-code) / BigQuery (few-code) Machine Learning.
  • At Next 23 BigQuery Studio has been announced as providing a single interface for data engineering, analytics, and predictive analysis to simplify end-to-end data workflows without having to switch between tools improving how teams collaborate.
  • Results can easily be stored and visualized with Looker Studio using built-in integrations or in a spreadsheet using Connected Sheets. It is also possible to continue using your current visualization tools with the connectivity options available.

Insights and analyses can now be shared and teams are able collaborate to finalize requirements and specifications. Next, engineering teams can develop pipelines and deliver data in a consistent and up-to-date manner to the users of the data platform.

A day in the life analyzing processed data

With existing and new data available in the data platform, it can now be used by different stakeholders to analyze, consume, make decisions, learn, etc. Business users will start with a question that they want to answer using data. But, getting access to the data they need is only the first part. It also requires understanding what data is available and where it can be found. Moreover, it is important that the data (quality) can be trusted to be correct, complete, etc.

Answering questions using data analysis is a powerful tool for making decisions that can improve your business. Example of questions might be:

  • What are the top 5 selling products per store?
  • Which inventory items are low on stock? Which ones are forecasted to be out of stock in the coming days?
  • Which marketing channels are most effective at generating leads?
  • Etc.

Next, business users can leverage the appropriate methods and tools that they are familiar with to derive the insights and answer the question. The users might turn to their team to collaborate, discuss or even get help. A data analyst or engineer can help using more technical tools without moving the data.

Finally, the results of the analysis can be shared with the team or other stakeholders in the organization. Decisions can be made and if needed new ideas to grow data visualization or processing can be identified for future use. For instance, a new pipeline with a pre-defined report can be implemented so that insights can be consulted on a daily basis. In this way decision makers can look at the insights when needed and do not need to wait for people to manually perform the analysis.

Analyzing processed data

Let’s consider a data platform receiving, transforming and storing data using multiple pipelines ready for use. Cloud Storage can be used as Data Lake storage to store various data; including structured, semi-structured, and unstructured data while BigQuery can be used for structured or semistructured (Native JSON type, nested fields) data. As your data may be stored across BigQuery, Cloud Storage, and even other Clouds, it’s important to unify and make it accessible using BigLake. BigLake is a data access engine that enables you to unify, manage, and analyze data across your data lakes and data warehouses. It provides increased performance and allows extra levels of governance and (columnar and row level) security.

Once the data is on GCP, data users across the organization can start to access it:

  • To be in control of the data platform it is important to establish guidelines and best practices ensuring that data is accurate, consistent, protected, and compliant with regulations. Data governance includes activities such as data cataloging, data lineage, data quality management, PII identification, and data access control. Dataplex will help with these tasks and includes a fully managed data catalog to help you discover, understand, and enrich your data. If required, integration with third party data catalogs is also possible.
  • BigQuery can be used for analyzing data using SQL code, from simple to complex queries.
  • More technical users can make use of the Vertex AI platform to run Python notebooks or AutoML (no-code) / BigQuery (few-code) Machine Learning.
  • At Next 23 BigQuery Studio has been announced as providing a single interface for data engineering, analytics, and predictive analysis to simplify end-to-end data workflows without having to switch between tools improving how teams collaborate.
  • Results can easily be stored and visualized with Looker Studio using built-in integrations or in a spreadsheet using Connected Sheets. It is also possible to continue using your current visualization tools with the connectivity options available.
  • And any internal application can now access these to embed or use in their processing. In this, make sure to consider the core principles of system design. These describe how to achieve a robust solution and introduce changes atomically, minimize potential risks and improve operational efficiency. By focusing on use cases, key capabilities get implemented progressively allowing the vision and strategy to become reality while respecting the pace of organizational and technical change.

Data Analysts and Engineers should work together and ensure the data platform remains operational and that new capabilities are introduced in an efficient and effective manner. There are many factors to consider while doing so:

  • Ensure that data processing is up-to-date
  • Enable business users to interact with data processing in an easy way when needed; for instance for addressing data quality issues, adding reference data, following up on data delivery, etc.
  • Monitoring and optimizing the cost of the platform using Cloud FinOps (Get introduced to FinOps on Google Cloud in one of my previous blog posts)

Conclusion

Data analysis and visualization are key for any organization that wants to make better decisions using data. Making sure your teams have access to this data and the tools they need is essential to make it work. Nevertheless, every organization will start this journey at their own pace and Google Cloud’s unified data platform will be available to support: whether you require scaling the platform as usage grows or adding new capabilities to unlock additional value.

So, how will your team interact with this data platform? Different data users (business, governance, engineer, etc.) have different skills and needs when it comes to working with data. For each, a set of tools is available to get the most out of the platform.

--

--

Peter Billen
Google Cloud - Community

Peter is a Principal Architect at Google Cloud. He is helping companies get the most out of their digital transformation while moving to the cloud.