Analytics Vidhya
Published in

Analytics Vidhya

Storage Options On Google Cloud Platform

In this blog, we will explore various storage options on the Google Cloud Platform with the use cases.

Please subscribe to my youtube channel for tech related videos.

Photo by Tyler Casey on Unsplash

Introduction

When we move from on-prem to the cloud world, storage options can be overwhelming & selecting right storage option for your particular use case can be bit time-consuming task, hence having a clear picture of all the options with the use cases and alternatives can build a strong foundation in deciding which storage to choose based on requirements.

In this article, I will attempt to provide a clear picture of all the storage options on Google Cloud with its common use cases.

Bird’s-eye view

Storage Options in GCP

If we can divide storage based on what type of data we will store into GCP then it will be of two types:

  • Storage Options for Structured Data
  • Storage Options for Unstructured Data

Unstructured Data

For unstructured data, we can either use Block storage or Object Storage.

Based on our use case we can choose a suitable storage type.

If we choose Block storage then storage type can be Persistent Disk or Local Disk. In Object storage case we have Google Cloud Storage as our storage type.

Persistent Disk:

  • Persistent Disk is basically tied to Google compute instance.
  • It has size limitation of 64 TB. but it has to be allocated in advance . its not pay as you go model.
  • Persistent Disk can HDD or SSD based on cost and performance requirements.
  • Persistent Disk can be regional(ex-US central, US west, etc.) or zonal(US central a,b,c etc.) .

Google Cloud Storage(GCS)

  • GCS doesn't tie to VM and can be used as the storage layer for many use cases.
  • It's infinitely scalable so no size restriction.
  • Its Pay as you go, model, that means you only pay for what you store.
  • GCS bucket can be regional or global.

Okay, but when to use what?

In case if you are using a compute engine and each VM needs local storage then better to go with SSD or HDD, but in scenarios where you need global access of data, GCS should be the choice.

In most of the scenarios, you might want to leverage the combination of both based on the data type.

Structure Data

Structure data can be stored based on their usage.

If the requirement is to choose storage for Online Transaction Processing(OLTP) systems then we have the following options:

Cloud SQL :

  • Cloud SQL is basically managed Mysql, Postgresql, or MS SQL server on GCP.
  • It’s best suitable as a database for online transaction processing systems(for example, financial transaction system, e-commerce sales, travel reservation system).

Cloud Spanner:

  • Cloud Spanner is google proprietary database which suitable for online transaction processing.
  • It's a globally distributed database system which has very high availability SLA (99.999% means yearly 5 mins downtime).
  • It's horizontally scalable with high read-write performance.

Okay, but when to use what?

Cloud Spanner is built for a very niche use case. If you have a massive amount of data that has to span across the globe with high performance then Cloud Spanner is the choice. Otherwise, Cloud SQL should be the choice.Also, Cloud Spanner is costly compare to Cloud SQL, so choose wisely.

If the requirement is to choose storage for Online Analytical Processing Systems then we have the following options:

BigQuery: (For detail about Big Query : Link)

  • BigQuery is a data warehouse solution on GCP.
  • We can store petabyte data and query and analyze using SQL within minutes.
  • We pay for the amount of data processed by per query.
  • Many Business Intelligence tool has connectors for BigQuery so we can connect without much trouble.

BigTable

  • BigTable is another offering as a database for analytical use cases.
  • BigTable is a NoSQL database built upon googles proprietary Distributed File System called Colossus.
  • BigTable can be compared with Open source HBase which is a distributed database built on top of the Hadoop Distributed File System(HDFS).
  • Just to mention BigTable or HBase doesn't provide you SQL interface to query database since BigTable is not SQL database.
  • BigTable is horizontally distributed NoSQL database, used for low latency use cases. Google Map, Gmail, Youtube uses BigTable internally.

Okay, when to use what?

BigQuery is more suitable when we need SQL interface to perform analytical Query on underlying data storage . This could be the use case for business intelligence where data from various system are stored in BugQuery data warehouse and connect with business intelligence tools like Tableau or Looker to analyze and build dashboards. On the other hand if the NoSQL database with high scalability and throughput for key-value data is something we need then BigTable is a choice. BigTable is a low latency database and suitable when SLA is very high.

I hope this blog was helpful. I appreciate your time. Thank you for reading.

Liked this blog ? Find more @ : https://asyncq.com/

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Is Code School the New High School?

Increase or Decrease the Size of Static Partition in Linux.

gRPC basics with example

CS373 Fall 2021: Jibran Khalil — Blog 10

Zendesk to Neo4j Integration

threading — Manage concurrent threads

AWS EKS and Google Cloud Functions

Create Dummy Data Using Laravel Tinker

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Suraj Mishra

Suraj Mishra

Google Cloud Certified Professional Data Engineer. Find more blogs: https://asyncq.com/

More from Medium

MLOps made easy using Titan

Using Google Pre Built ML and Cloud Functions to decouple addresses information

How to implement smart recommendation online learning in oppo Smart Recommendation Sample Center…

Building TimeSeries ML model with Prometheus DevOps Dataset