GCP Crash Course: Storage and Databases

Antonella Blasetti

Published in

GDG Google Developer Group & WTM Rome

5 min readMay 15, 2019

Basic Concepts

Storage

Disks? Not so simple…

Common Disks for Cloud are called Block Storage and are usually made of slices of several disks “virtually” seen as one. And replicated behind the scene.

You may decide type (SSD, HDD), speed and location (latency). Regional.

Different Costs and Performances.

But you may also have RAM Disks and Distributed File Systems

key value Storage (similar to Drive, Dropbox) → Cloud Storage.
Cloud Storage (similar to S3) may only contain files (not applications,for example)
and it is inexpensive and full of nice functions (versioning, lifecycle, archiving..)

Databases

We are talking about Managed Services by GCP, that is, a pret-a-porter DB.
But they usually have automated backups and failover capabilities. With minimum or no effort.

Amazing! Pay attention…it is not a free meal. DBs are usually the most expensive Services.

Main Features/Products:

Relational

Cloud Sql (mySql and Postgres): small (mySql) to medium (Postgres) DB. Inside a Region.

Spanner: big, multiregion and highly scalable RDBS

noSql

Cloud Datastore: economic document noSQL (like Mongo)
with like SQL query language, scalable but it has not always strong consistency.

BigTable: at a same time a DB and a BigData tool. Unique in its kind.
A key value noSQL DB, but the values are structured in hierarchical columns.
Used by Google for Gmail, Maps ecc.

Petabytes in ms.

Ask yourself

What does it mean RPO and RTO? Why are they so important?

A Wordpress Site, a Gaming Startup and a Big Corporation or Institution.
Which are the requirement for Database and Security? Which products are suitable? Why?

How can I build a Static Web Site with Cloud Storage? Which are the benefits?

Cheatsheet

No big explanations but only the relevant info with links. More info than you actually need for certification. Just pick what you need.

DISKs — Block Storage

Zonal standard persistent disk and Zonal SSD persistent disk: max 64 TB IOPS: 3,000/60,000

Regional persistent disk and regional SSD persistent disk: max 64 TB replicated in two zones. IOPS: 3,000/60,000 — greater latency

Local SSD: more expensive ephemeral max 3TB Instance IOPS: 280,000/680,000

Create a file server or distributed file system on Compute Engine to use as a network file system with NFSv3 and SMB3 capabilities.

Mount a RAM disk within instance memory to create a block storage volume with high throughput and low latency.

Snapshots — Backups incremental encryption with system-defined keys or with customer-supplied keys

Cloud Storage

Objects and Buckets 99.999999999% durability

gsutil tool

encryption at rest

object versioning, object notification, access logging,

lifecycle management, per-object storage classes,
composite objects and parallel uploads.

storage-transfer service transfers data from an online data source to a data sink. Your data source can be an Amazon Simple Storage Service (Amazon S3) bucket, an HTTP/HTTPS location, or a Cloud Storage bucket. Your data sink (the destination) is always a Cloud Storage bucket

Classes:

Multi-Regional Storage 99.99%

Regional Storage 99.99%

Nearline Storage 99.9%

Coldline Storage 99.9% $0.007 Month/GB

Useful functions:

Cloud Pub/Sub Notifications for Cloud Storage Cloud Functions:

Object Change Notification:

Pubsub-notifications

inspecting x sensitive data

Storage-transfer vs transfer appliance (snowball)

SQL DBs

MySQL: Regional 1st Generation up to 5.5–2nd Generation 5.7

Max 16 GB of RAM and 500 GB data storage

sql-proxy → secure external connections

Data replication + zones; Mysql client External applications

Kubernetes Engine → connect to

HA configuration Cluster: primary instance failover replica (different zone only one)

semisynchronous replication replication lag.

SQL Postgres

up to 416 GB of RAM and 64 CPUs

Regional 2nd Generation 9.6 -. Data replication + Zones

no Point-in-time recovery (PITR)

PostgreSQL Extensions PL/pgSQL SQL procedural language

External applications

App Engine applications

Applications running on Compute Engine

Applications running on Kubernetes Engine

Cloud Functions

HA configuration Regional instance located in a primary and secondary zone + standby instance
synchronous replication

Cloud Datastore

multi-zone and multi-region replicated configuration

noSQL document GQL → SQL-like query language cheap+free tier

Table→ Kind — Row → Entity — Field → Property key

ACID properties

BIGTABLE

Cloud Bigtable is a key/value store. It does not support joins, nor does it support transactions except within a single row

ZONAL regional

Cloud Bigtable performs best with 1 TB or more of data.

noSQL that scale to billions of rows and thousands of columns, enabling you to store terabytes or even petabytes of data. Managed but not-so-easy to configure/optimize.

Instance is a container for your clusters and nodes

single-keyed data with very low latency integrates Apache Bigdata Apache HBase library for Java

replicate→ add a second cluster all automatic

Column families data→ raw byte strings 8-byte big-endian

Front end server pool → node (pointers) → data (colossus)

sharded into blocks of contiguous rows, called tablets

Understanding Cloud Bigtable Performance Choosing a row key

Cloud Bigtable loadtest tool

design a Cloud Bigtable schema

Design time series hotspotting

fully managed, mission-critical, relational database SQL (ANSI 2011 with extensions) automatic, synchronous replication for HA.

Replication globally synchronous replication always the most up-to-date data read-write replicas, read-only replicas, and witness replicas.

Cloud Spanner divides your data into chunks called “splits”, where individual splits can move independently from each other and get assigned to different servers, which can be in different physical locations.

Read-only replicas are only used in multi-region instances, are not eligible to become a leader and may serve stale reads without needing a round-trip to the default leader region.

Regional configurations contain exactly three read-write replicas.

Multi-region configurations contain more replicas but not all have voting rights.Instead Witness replicas vote for commit but don’t have all the data.

best-practices

Study Material

Links may not work if you are not enrolled to Coursera

AWS Storage short and free

Demos Videos

Don’t get scared: many videos last just 1 minute…Only a little demo. If you are in a hurry, they can replace labs.

Uploading Files and Folders to Google Cloud Storage

Google Cloud Storage: Massive Scalability Plus More | Google Cloud Labs