Photo by Ian Battaglia via Unsplash

Google Cloud data storage services you can use to reach planetary scale

Jeffrey Lewis
The Startup
Published in
5 min readJan 1, 2020

--

Scaling your data layer is no trivial task. Having your database spread across a multitude of servers through sharding or replication will require some effort to maintain consistency across all instances. Fortunately, there are options today that abstracts the complexity of managing a distributed data storage system.

Modern cloud storage solutions scale your data layer automatically so you can focus on your application and not worry about whether your databases can keep up with the load.

Cloud data storage solutions abstracts the complexity of managing a distributed data storage system.

Today, we’ll explore the storage options available at Google Cloud Platform (GCP), which hosts some of the world’s highest traffic websites and apps.

Here’s a quick guide to simplify your decision in choosing the right storage service for your needs. First, let’s look at what GCP offers out of the box:

Diagram: Overview of GCP Data Storage Services

File Storage

Cloud Storage: Use this option to store any kind of data from audio, video, images or other files. Cloud Storage has extremely high read and write speeds, and has nearly unlimited (exabytes) storage limit. Scaling is automatically taken care of, and files are replicated geo-redundantly on multi-region and dual-region locations for high availability and backup.

Relational Data / SQL

Cloud SQL: Fully managed hosted service for MySQL or PostgreSQL databases. Google takes care of patches and updates, managing backups, handling failover and configuring replication. Data is automatically replicated across different regions.

This is a common option when you’re migrating from an existing MySQL or PostgreSQL database and do not need horizontal scaling. The input/output operations and max storage are limited to a single node as only vertical scaling is available for Cloud SQL.

If you need to scale beyond a single node, you may have to share your data by splitting parts of your data across multiple database instances and handle them on the application level, or opt for the more expensive option Cloud Spanner (below).

Cloud Spanner: Relational SQL database provided by Google Cloud with built-in horizontal scaling capabilities that are automatically handled. Your data can scale up to petabytes in size and also support atomic transactions as with a traditional SQL database. Because of its ability to scale like a NoSQL database, Cloud Spanner comes with a higher price than Cloud SQL.

Unlike Cloud SQL, Cloud Spanner is not API compatible with either MySQL or PostgreSQL. All interactions with the database are made through the custom API which supports SQL like syntax (known as the Data Manipulation Language) for queries and data insertion or updates.

Non-Relational Data / NoSQL

Cloud Datastore: NoSQL database which stores data in JSON documents (similar to MongoDB) which are automatically indexed so you can query on individual attributes. An interesting property of Cloud Datastore is that it supports atomic transactions, which means it can execute a set of operations where either all succeed, or none occur. This is similar to a relational database but with the advantage of horizontal scalability found in NoSQL databases. Cloud Datastore can store up to terabytes of data.

Note: Cloud Datastore is being replaced by Cloud Firestore and in the future, all existing Cloud Datastore will be automatically upgraded to Cloud Firestore in Datastore mode.

Cloud BigTable: Fully managed wide-column database similar to HBase and Cassandra with automatic scaling up to petabytes of data and offer sub-10ms latency. BigTable Provides easy integration with open source big data tools like Hadoop and uses the popular HBase API. A multi-row atomic transaction is not available in BigTable, instead, atomicity is guaranteed only on per row (columns) basis.

Data Analysis

BigQuery: An ideal option when you need to run data analysis on an enormous data set. BigQuery provides extremely fast response when running complex SQL like queries up to petabytes of data.

BigQuery has a relatively low storage cost and also comes with 10 GB free quota each month. However, there is a cost for running queries depending on the data size, which also comes with free monthly quota and at the time of writing is free first 1 TB per month.

This pricing structure coupled with the ability to run complex queries make BigQuery a good option for on-demand or periodic reporting and data analytics.

Memory

Cloud Memorystore: Redis compatible in-memory data store that is fully managed by Google enabling high availability, failover, patching, and monitoring. Cloud Memorystore provides extremely fast I/O and low latency of up to 300 GB per instance — ideal for storing application states, sessions, and data caching. You can easily achieve the sub-millisecond latency and throughput your applications need.

Mobile Solutions

Firebase provides mobile SDKs and tools to integrate directly with some Google Cloud products.

Cloud Storage for Firebase: Provides direct access and easy integration with Cloud Storage via the Firebase mobile SDK for your iOS, Android, and web apps. The Firebase SDK handles security via Firebase Authentication and can perform efficiently even in poor network quality. Uploads and downloads can restart and resume from where they stop to save time and bandwidth. Files uploaded via the Firebase SDK can also be accessed from Google Cloud Storage.

Cloud Firestore: NoSQL database which stores data as document objects similar to Cloud Datastore, but provides powerful native mobile SDKs so that your iOS, Android, and web apps can access directly. Data is also cached locally to allow offline use and is synchronized to Cloud Firestore when the device comes online. Cloud Firestore has built-in security for mobile access via Firebase Authentication and Cloud Firestore Security Rules for iOS and Android.

Note: Cloud Datastore is being replaced by Cloud Firestore and in the future, all existing Cloud Datastore will be automatically upgraded to Cloud Firestore in Datastore mode.

Conclusion

Google Cloud Platform has a plethora of storage options designed for different use cases with the ability to scale like Google. Hopefully, in this overview you have gained a rough idea of all the data storage services available on GCP, and how to decide which one best fits your needs. Keep in mind though that there is no such thing as one size fits all, so you may have to use a combination of services to attain your goals.

--

--

Jeffrey Lewis
The Startup

Software engineer with over a decade of professional experience.