Google Cloud Architect Professional Certification Notes

I am planning for Google Cloud Architect Professional Certification.

Prepared few notes for final rewamp.Hope it will be useful for you as well.

Stackdriver Debugger

  • Support to gce, app engine standard/flex, gke
  • lang: java,python,go,php,nodejs,ruby,.net
  • App engine standard has debugger enabled by default

Debug Snapshot

  • allow to inspect without stoping or restarting app
  • capture local variables
  • you can specify certain condition, location to return snapshot
  • Note: `A snapshot is only taken once during app runtime.` You can manually retake the snapshot in order to capture a new set of data.

Debug Logpoint

  • Logpoints remain active for 24 hours after creation, or until they are deleted or the service is redeployed.

Logs Panel

  • The Debug page includes quick access to log messages through an in-page logs panel, allowing you to see log messages next to your source code.

Access Control

  • Debugger Agent
roles/clouddebugger.agent
Can register the debug target, read active breakpoints, and report breakpoint results.
This role is normally assigned to the service account running with the debugger agent.
  • Debugger User
roles/clouddebugger.user
Can create, view, list, and delete breakpoints (snapshots & logpoints) as well as list debug targets (debuggees).

Stackdriver Error Reporting

  • support for java, python, nodejs, php, ruby,go ,c#
  • automatic and real time error in production logs
  • count, analyze, agrregate errors and crashes
  • error details: time chart, occurrences, affected user count, first and last seen dates and a cleaned exception stack trace.
  • Opt-in to receive email and mobile alerts on new errors

Data model

  • error event
  • error group
  • error stats
  • error group stats

Stackdriver Trace

Stackdriver Trace is a distributed tracing system that collects latency data from your applications

  • near to realtime performance analysis
  • can capture trace from vm, container, GAE
  • find perfomance bottelneck
  • Java, Node.js, Ruby, and Go
  • Enabled default for GAE standard

FAQ

  • 30 days retention
  • sample request rate 0.1 seconds, possible to set custom one
  • To force a specific request to be traced, add an X-Cloud-Trace-Context header to the request.
  • you can add filter to trace

Data Model

  • Traces
A trace describes the amount of time it takes an application to complete a single operation.
e.g request from user to response time
each trace has one or more span
  • Spans
A span describes the amount of time it takes an application to complete a suboperation in a trace
e.g round trip between internal application RPC call

Stackdriver Logging

  • Accept logs from any source.
  • Realtime and fully managed service
  • Accept custom logs
  • Analyse realtime by inserting into bigquery
  • log retention 30 days
  • allow to create metrics from log data
  • altering on events
  • it uses fluentD

Log Retention

Admin Activity audit logs: 400 days
Data Access audit logs: 30 days
Access Transparency logs: 30 days
Logs other than audit logs or Access Transparency logs: 30 days 
API page token age: 24 hrs

Google App Engine

  • It has diff f/w setting, which allow-all traffic default
  • you can deny IP’s to protect from DDOS

Standard Environment

  • Python2.7, java7,8, php5.5, go-1.6,1.8
  • Specific version of app
  • scale down to 0
  • Very quick scaling
  • scaling type: Manual, Basic, Automatic
  • Application with unpredicatable spiky and extreme load
  • no ssh, no background process check, no write to local file, limited n/w access, no third party binary
  • pricing on instance hour
  • max request timeout 60s
  • Instance startup time: seconds

Flexible Environment

  • Docker containers on VM’s
  • Useful for application with consistent and steady workload and gradual scale up and down requirements
  • support for Python, Java, Node.js, Go, Ruby, PHP, or .NET
  • Docker container with custom runtime
  • Allow third-party binaries
  • Uses or depends on frameworks that include native code
  • Startup time: Minutes
  • Max request timeout: 60s
  • Allow SSH, N/w access, Write to local disk, Modify runtime through dockerfile
  • Scaling: mannual , automatic
  • Pricing Based on usage of vCPU, memory, and persistent disks
  • VM restart on weekely basis
  • SSH access disabled by default, however you can enable it.
  • Instances are health-checked, healed as necessary, and co-located with other services within the project.
  • Critical, backwards compatible updates are automatically applied to the underlying operating system.

Possible to migrate from standard to flexible environment.

Comparison between standard and flexible

What is GQL?
GQL is a query language that is used with Datastore.Not supported by JAVA

IAM

Member

person or service account
google account
google group
G suite domain
Cloud identity domain

Roles

  • Collection if permission to given resources
  • Permission can only be used with roles.
  • Can’t assign permission to users or service account directly

Types

  • Primitive
Owner
Editor
Viewer
  • Predefined or curated
more granular control
e.g compute.instance.admin

Billing roles

Billing admin — full access to billing
Billing creator — can create new billing account
Billing user — can assign billing account
Billing viewer — view only
Billing manager — can assign or disable project from billing account

Cloud Bigtable and other storage options

  • Cloud Bigtable is not a relational database; it does not support SQL queries or joins, nor does it support multi-row transactions.
  • it is not a good solution for storing less than 1 TB of data.
  • If you need full SQL support for an online transaction processing (OLTP) system, consider Cloud Spanner or Cloud SQL.
  • If you need interactive querying in an online analytical processing (OLAP) system, consider BigQuery.
  • If you need to store immutable blobs larger than 10 MB, such as large images or movies, consider Cloud Storage.
  • If you need to store highly structured objects in a document database, with support for ACID transactions and SQL-like queries, consider Cloud Datastore.

Persistent Disk:
Fully-managed, price-performant block storage that is suitable for virtual machines and containers.
good for:

  • Block storage for Google Compute Engine and Google Kubernetes Engine
  • Snapshots for data backup

Google Cloud Storage
A scalable, fully-managed, highly reliable, and cost-efficient object / blob store.
Good for:

  • Images, pictures, and videos
  • Objects and blobs
  • Unstructured data

Google Cloud Bigtable
A scalable, fully-managed NoSQL wide-column database that is suitable for both real-time access and analytics workloads.
Good for:

  • Low-latency read/write access
  • High-throughput analytics
  • Native time series support

Google BigQuery
A scalable, fully-managed Enterprise Data Warehouse (EDW) with SQL and fast response times.
Good For:

  • OLAP workloads up to petabyte-scale
  • Big Data exploration and processing
  • Reporting via Business Intelligence (BI) tools

Google Cloud Datastore
A scalable, fully-managed NoSQL document database for your web and mobile applications.
Good For:

  • Semi-structured application data
  • Hierarchical data
  • Durable key-value data

Google Cloud Spanner
Mission-critical, relational database service with transactional consistency, global scale and high availability.
Good For:

  • Mission-critical applications
  • High transactions
  • Scale + Consistency requirements

Google Cloud SQL
A fully-managed MySQL and PostgreSQL database service that is built on the strength and reliability of Google’s infrastructure.
Good For:

  • Web frameworks
  • Structured data
  • OLTP workloads

Preemptible VM

  • short lived max 24 hrs
  • 30 seconds notice before terminate
  • batch workload
  • redenring, media transcode, big data analytics
  • many small vm’s that few larger
  • preserve disk — no-auto-delete
  • if instances failing frequently — check health check and f/w rules

Backup & Recover

  • Disk snapshot
Incremental, freeze disk operation
stop disk operation before snapshot
  • Database backup
Do not use snapshot it requires freeze
Take db backup on disk or GCS

Cloud Storage

  • Enable versioning to prevent from delete and override
  • gsutil command
set versioning
gsutil versioning set on gs://bucket
delete objects
 
 gsutil rm gs://bucket/<object>
 
delete specific version
 
 gsutil rm gs://<bucket_name>#<generation_number>
 
 -f Continues silently (without printing error messages) despite
-I Causes gsutil to read the list of objects to remove from stdin.
 This allows you to run a program that generates the list of
 objects to remove.
-R, -r The -R and -r options are synonymous. Causes bucket or bucket
 subdirectory contents (all objects and subdirectories that it
 contains) to be removed recursively. If used with a bucket-only
 URL (like gs://bucket), after deleting objects and subdirectories
 gsutil will delete the bucket. This option implies the 
 -a option
 and will delete all object versions.

Managed Instance Group

GCE with managed instance group

  • Use instance templates
  • rolling update
  • canary with multiple templates

Security

Separation of duties

  • diff project for diff users and application
  • least privilage
  • security team, assign border roles like org viewer/project viewer
  • Signed URL does not required GCP account
  • single bucket for user assets
  • share bucket access with signed url
  • Penetration test should be from public n/w
  • Each N/w has own RFC-1918 internal network across regions
Firewall rules restrict based on 
- Port
- Ip
- Subnets ranges
- Tags
  • Diff ip ranges for LB to instance healthcheck 
     130.311.210.0/22, 35.200.0.0/16
  • External SSH disabled — use cloud shell
  • Analyse credit card /sensitive data/ PII data — use squid proxy + stackdriver logging and monitoring — → bigquery
  • Import PII from on-premise/GCS to datastore 
     use oauth2.0 → service account → datastore or GCS
  • Export simply using gsutil

Application Code Error

  • java: Digest errors
     solution: resign jar file
  • Python: mobile app show older cahce version

1) override datastore entry
 2) setup app to work from single instance
 3) Modify api to prvent cache
 4)Set HTTP cache to -1
 
Correct answer: Set HTTP cache to -1**

Network

VPN Requirement

gateway ips from both sides
Shared secret
IKE v1 or v2 and ESP protocol support
Non conflict ip address range

VPN

  • 1.5gbps per vpn tunel
  • you can setup multiple tunnel to increase data transfer
  • Static/dynamic routes using cloud router

Cloud Router

ASN number: unique 65412–65535
BGP range: 169.254.0.0/30

Define service

- rough design
- structure
- measure

Define 3 tier

- Presentation (n/w)
- Business (compute)
- Data (storage)

Terms

SLI — service level indicators — max cpu 80%
SLO — service level objective. — current cpu load
RTO — recover time orbjective
RPO — recovery point objective

Design Process
Begin Simple → Iterate → Plan for failure → Measure stuff

Requirement gathering

- Qualitative
- Quantative — time, data ,users
- Scale 
- Size — Dimension, rate of change, replication

Business logic layer design

- Include microservices
- 12 factor
- Compute option

Cloud function

  • Not low latency service
  • Limited support to nodejs

12 factors mapping with gcp services

  • version control: cloud shell, source code repo
  • Strictly seprate build and run stage: Build GAE app in cloud shell, upload to GAE
  • Keep dev,stg,prod as similar as possible: deployment manager templates
  • explicit and isolated dependency: custom images
  • store config in env: metadata server, GCS
  • Max robustness, fast startup and graceful shutdown:MIG, templates, autoscaling

Failure due to loss

Single Point of failure

replicate everything
divide into microservices
load balance
Multiple machine, racks
N+2 one for upgrade, one for failure another failure

Correlated Failure

One server fails all requests/services faile
Zone/region fail
Group of related items fail at one time called **failure domain**
divide into microservice
seperate isolate design
independant services

Failure due to overload

Cascading failures

if one frontend fails, other gates overloaded and starts failing, it will begin chain of failure events
Prevention is best stratergy
Monitor safety size
Increased size for failure

Fan-in cast failure

single request/respose point
Tree based arch for bigquery

Queries of death overload failure

continuous retries query to backend

Positive feedback cycle overload failure

Retries 
crash looping
solution: Increase retry timeout after each retries fails
 Detect overload early

## Storge Buckets

- IAM role works at bucket level not at object level
- ACL works at bucket and object level
- Object ACL inherits from bucket ACL
- Object versioning and Lifecycle rules are at bucket level not at object level

Change Storage class:

- Regional Standard →nearline/coldline
- Multi-regional standard → nearline/coldline
- Regional → multiregional NOT POSSIBLE and vice versa.
- gsutil rewrite -s NEARLINE gs://<bucket_name>
- Chnaging storage class only works on new objects

Best Practices:
- Chosoe IAM over ACL
- IAM has audit trails for access
- When in doubt,use IAM
-For more fine grained access use ACL
- Use signed URL for timed based access

To create signed URL, you need service account key as crt.

e.g gsutil signurl -d 10m <service_account>.crt gs://<bucket>/<object>
 
 
## Compute Engine
 
- Persisten disk

- RAID/Disk partition NOT RECOMMENDED
 — Eighter as boot disk or extra disk
 — ssh/hdd
 — more reliable
 — 64TB max size
- Local SSD

- Can not be attached after instance creation
 — can’t use own encrytion
 — lost data after instance deletion
 — max size 375GB upto 8 disk = 3TB

- Images
 — Used for new instance creation/te,plates/MIG
 — Can be shared across projects
 — Custom images
 — Can be exported to cloud storage bucket in .tar.gz format
 — Can be shared acrros project using role : **Compute Engine Image role**
 — Availabe only for linux not windows
 — Deprecated Images:
 — Deprecated: shows warning
 — obeselate: can’t be used by new user, exisitng link continue to work
 — deleted: no one can use it
 — Active: possible to change version from obeselete to live via command line only
 — Image families are useful for image vesioning
 — Family always point to non-deperecated version
 — Recommended to stop instance before created image

- Snapshot
 — used for backup/archival purpose
 — Incremetal
 — Can’t be shaered over projects
 — For windows use VSS snapshot for non-boot disk
 — use ext4 for linux
 — Unmount disk if possible
 — run in offpick hours
 — stop write application/flush disk buffers
 — use multiple disks for large vloume
 — run fstrim for cleanup space

## Network
- VM hard limit 7000
- Network do not support ipv6, however GLB/GAE support it

- F/w rules
 — support both deny/allow
 — source/dest, tag, port, protocol, priotity
 — in default n/w icmp,rdp,ssh is allowed
- Routes
 — NAT/proxy server uses MANY-TO-ONE 
 — Need `can_ip_forward` set to enabled.

### Shared VPC
 — host project → service projects
 — Static ip belongs to only the project who reserved it.
 — Resources tied: instances, templated, MIG, forwarding rules for internal load balancer
 — use case: two tier arch shaed across project, hybrid n/w
 — IAM roles:
 — Shared VPC Admn -compute.xpnAdmin
 — org level role
 — can configure shared vpc
 — associate host and service project
 — Grant n/w user role
 — It is not enabled by default
 — Network User roel- compute.networkUser
 — Project lvel role
 — create resources in shared vpc
 — discover shared vpc assests
 — also requires additional access like project admin,compute admin or editor

### Load Balancers

- GLB has native support for websockets

- NLB
 — Regional external lb
 — non-HTTP/S protocols
 — Balanced by IP protocol data (addr, port, protocol)
 — How it works
 — forwarding rules: matched criteria = addr, port range,protocol
 — Target pool -group of VM’s

### MIG
- Templates are global
- We can do cannary/rolling update deployment using multiple templates

Kubernetes Engine cluster:
The maximum size of a Kubernetes Engine cluster is defined as:

  • 50 clusters per zone
  • 5000 nodes per cluster
  • 100 pods per node
  • 300,000 containers

### GKE

- resize cluster
 # gcloud container cluster resize <name> — size 5`
 
 
 — Change machine type
 — create new node pool with new machine type
 # gcloud container node-pools create larger-pool — cluster=migration-tutorial — machine-type=n1-highmem-2 — num-nodes=5
 — Remove deafult node pool from schduleing
 # for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do; kubectl cordon “$node”; done
 — Drain the pods
 # for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do; kubectl drain — force — ignore-daemonsets — delete-local-data — grace-period=10 “$node”;done
 — delete old node pool 
 # gcloud container node-pools delete default-pool — cluster migration-tutorial

- Enable autoscaling
 # gcloud container cluster update <name> — enable-autoscaling — min-mun 3 — max-num 5
 
 
 
 # data ingetion from what type of services
 IN OUT
 GCS,pubsub,dataflow → BigQuery
 
### PUB/SUB
- Msg retention period: 7 days
- Subscripttion with no activity for 31 days automatically deleted.
- one-to-many, many-to-one, many-to-many

Common Scenario

- Balancing workloads in network clusters
- Implementing asynchronous workflows
- Refreshing distributed caches
- Logging to multiple systems
- Distributing event notifications
- Data streaming from various processes or devices. 
- Reliability improvement.

benefits and feature

- Unified messaging
- Global presence
- Flexible delivery options — pull / push
- Data reliability
- End-to-end reliability
- Data security and protection
- Flow control
- Simplicity

GCS,API,DATAFLOW,GCE,CLOUD LOGS — -> PUB/SUB — -> N/W, DATAFLOW,GAE,STACKDRIVER,GCE

### DATAFLOW
- Unified development model

Use cases:

- check Clickstream, Point-of-Sale, and segmentation analysis in retail
- check Fraud detection in financial services
- check Personalized user experience in gaming
- check IoT analytics in manufacturing, healthcare, and logistics

PUB/SUB,DATASTORE,GCS,APACE AVRO,KAFKA → DATAFLOW → BigQuery,ML,BigTable