STUDY GUIDE

Google Cloud Security & Operations

Google Cloud Digital Leader Synopsis Section 4

8 min readSep 12, 2022

Welcome to the final article in my Google Digital Leader Synopsis series! All the key terms that you will need to be familiar with are highlighted in bold and defined at the bottom of the article📌

In this article we will cover how the financial aspect of cloud technology is managed, as well as cloud security and cloud monitoring tool to manage resources.

Note: Please use this article as a guide to supplement your learning. I recommend reading all the Google cloud documentation provided in full for complete exam preparation.

Financial Governance

Managing cloud cost requires vigilance and real-time monitoring. Within small organisations, there may be one person would be responsible for finance budgeting and optimisation. However, for larger organisations multiple people may work across different departments. Therefore, a finance team could manage the cloud costs, but still they may struggle to fully understand the spending nature and requirement of each team.

✅ Solution: Consider the People, Process & Technology

People
🤝 Establish a partnership between finance, technology and business functions
🤝 Ensure best practices across the organisation are implemented and that there is visibility into cloud spend.

Process
🗒 Understand how, who and what resources are being used.
🗒 What cloud resources are being used and by whom?
🗒 What are the associated resource costs?
🗒 How do these costs measure against the broader business strategy?

Technology
👩‍💻 Understand what tools can be used to help manage cloud costs.
👩‍💻 Gain greater visibility and drive a culture of accountability for cloud spending to control the risk of overspend.

Total Cost of Ownership (TCO)

When migrating to the cloud, an organisation’s expenditure and cost ownership shifts from Capex to Opex. However, some organisations may choose to keep some of their business on-premises as well as some running on the cloud. The calculation of total cost of ownership for this senario would be little bit more complicated to calculate.

Google Cloud Best Practices & Cost Management Concepts

Google cloud have identified three best practices for ongoing cost management:

1.Identify individuals or teams that will manage costs.
2.Learn the difference between invoices and cost tools.
3.Use cost management tool for accountability.

The overall aim of cost management is to provide visibility, accountability, control and intelligence into an organisations cloud costs.

Visibility
👀 What are we spending money on? What are our forecasted cost?
👀 Costs and trends can be monitored to identified any areas of waste
👀 E.g. Built-in reporting tools, customer dashboards, Google Price Calculator

Accountability
📊 Defining areas of clear ownership for projects.
📊 What are the cost of each department. What teams are using what resources?
📊 E.g. Resource labels, Resource Hierarchy

Control
🔒 Organisations will need to ensure that correct permission are allocated to individuals so they only have access to the cloud resources that they need.
🔒 Budgets and alerts can be put in place to highlight when spending deviates.
🔒 E.g. Cost Management Controls, Cloud IAM

Intelligence
🧠 Use intelligent recommendations by Google Cloud.
🧠 Tailored recommendations that can be easily applied to optimises usages of resources, save time and minimise costs.
🧠 E.g. Recommendations AI pricing.

Security in the Cloud

The fundamental concepts of cloud security to understand are privacy, availability, security, and control. Google has applied the concepts to how they store and treat a customers data. Google Clouds commitments are :

1. You own your data, not Google.
2. Google doesn’t sell customer data to thirds parties.
3. All customer data is encrypted by default.
4. Google Cloud guards against insider access to your data.
5. Never give any government entity backdoor access to your data.
6. Google’s privacy practices are audited against international standards.

Resource Hierarchy & IAM Policy

Organisation have control of who and what resources can be access in the cloud. This can be configured using a combination of IAM Policy and Resource Hierarchy. IAM ( identity access management) policy specifies access controls for Google Cloud resources and it consists of three parts:

1️⃣ WHO | 2️⃣ CAN DO WHAT | 3️⃣ ON WHAT RESOURCE

“Who” refers to a persons credentials such as a Google account, service account or cloud domain. “Can do what” relates to what permission is given to the “Who”. For example, permissions can be given to one person to be a viewer or a resource, where as another person can be given permission to be an editor and make changes to a resource. It is recommended to use the least-privilege model to give a user the minimal amount of privilege to do the job. “On what resource” refers to cloud resource that is required.

Resource hierarchy provides a structure on how IT teams can manage and control an organisation of teams access to cloud resources.

Domain
☁️ Resides above the organisation level.
☁️ Handled through Cloud Identity and helps manage user profiles.

Organisations
🏠 Managed through the Cloud Console.
🏠 Let’s administrators see and control Google Cloud resources and permissions.

Projects
📓 Belong to an organisation.
📓 Used for grouping Google Cloud resources like Cloud Storage buckets.
📓 Can inherit permissions from any folders above it as well as from the organisation at the top.
📓 The basis for enabling APIs, billing and managing collaborators.

Folders
🗂 Allows logical grouping of multiple projects that share common IAM permissions.
🗂 Used to isolate projects for different departments or for different environments.

Cybersecurity Challenges

When an organisation manages its data on-premise, they will be responsible for all the security measures to ensure the safety of their data. However, organisations can take advantage of using cloud technology and the security features it can offer when migrating to the cloud. In the case of Google Cloud, it offers a multilayer approach to security to ensure full coverage of defence from hardware to operations. The diagram below shows the most common cybersecurity threats to an organisations data:

Physical Damage: E.g. Data lost due to damages to the physical hard disk
Malware Attacks: Data can become damages or destroyed by malware.
Third-Parties: No adequate security measure for these systems.
Lack of knowledge: Security knowledge for maintaining security plans is essential to stay ahead of potential data security threats.
Criminal attacks: E.g DDoS attacks is a malicious attempt to disrupt the normal traffic of a targeted server.

Monitoring Operations

Let’s assume that an organisation has successfully migrated data, applications or operations to the cloud and have configured access and security and everything is up and running. How can the organisation be sure that the service they are receiving from the cloud is optimal? What if there is a bug in the code, resulting in prolonged down time? How are they going to find this bug!?🤯

Measuring Service

Sometimes disruptions occur unexpectedly. These could be an organisational problem (updates to an application code resulted in a bug) or it could be a cloud provider issue where response times are exceedingly long. In the case of cloud provider issues, they use standard practices to define and measure service availability for customers:

Service Level Agreement (SLA)
📝 A contractual agreement between the cloud provider and the customer
📝 Baseline for quality, availability and reliability of services
📝 Cloud provider would incur costs if the baseline is not met.

Service Level Objects (SLO)
🎯 A goal for the cloud service performance, agree between the cloud provider and customer.
🎯 Exceeding the SLO results in a happy customer.

Service Level Indicator (SLI)
🔍 A measure of services provided.
🔍 E.g. reliability, latency, errors.

DevOps and Site Reliability Engineering

DevOps and Site Reliability Engineers (SRE) play a vital role in reducing disruptions to service. DevOps is a combination of developers who are responsible for writing code for systems and operators are responsible for ensuring that these systems are reliable. Their main objectives are to:

1. Reduce Silos
2. Accept failure as normal
3. Implement gradual changes to reduce the cost of failure
4. Leveraging tooling and automation
5. Measure everything

SRE are similar to DevOps, but their goal is to create ultra-scalable and highly reliable software systems and to think about the end-user’s perspective while working on a system.

Google Cloud Resource Monitoring Tools

There are two categories of tools for monitoring Google cloud resources, operations-focused and performance management. The main tools to know are summarised in the table below.

📌 Key terms

Capital Expenditure (Capex): Money an organisation spends to buy, maintain, or improve its fixed assets, such as buildings or equipment.
Operational Expenditure (Opex): Ongoing cost for running a product, business, or system.
Total Cost of Ownership (TCO): A comprehensive assessment of all layers within the infrastructure and other associated costs across the business over time. Includes acquiring hardware and software, management and support, communications, and user expenses, and the cost of service downtime, training and other productivity losses.
Privacy: The data an organisation or an individual has access to, and who they can share that data with.
Security: The policies, procedures and controls put in place to keep data and infrastructure safe.
Availability: The duration for which the cloud service provider guarantees that client’s data and services are up and running or accessible.
Compliance: Meeting standards set by a third party. These could be regulatory or international and are common in highly regulated industries such ad Finance and Pharmaceuticals.
Least- privilege: Give minimal amount of privilege to complete the job.
Resource Hierarchy: How an IT team can organise a business’s Google Cloud environment and how that service structure maps to the organisation’s actual structure. It determines what resources users can access.
Site Reliability Engineering (SRE): A discipline that applies aspects of software engineering to operations. The goals of SRE are to create ultra-scalable and highly reliable software systems.
Monitoring : Gathering predefined sets of metrics or logs. Monitoring is the foundation for SRE because it provides visibility into the performance, uptime, and overall health of cloud powered applications.
Log file: A text file where applications (including the operating system) write events. Log files make it easier for developers, DevOps and system administrators to get insights and identify the root cause of issues within applications and the infrastructure.
Logging : A process that allows IT teams to analyze selected logs and accelerate application troubleshooting.

Ready for the next steps?

Click here for to move onto the next sections:

📓Intro: How I studied for the Google Cloud Digital Leader Exam
📓Section 1: Digital Transformation
📓Section 2: Innovation with Data
📓Section 3: Infrastructure & Application Modernisation