Choose the right storage for your apps in GCP

Nirav Kothari
GDGCloudMumbai
Published in
5 min readJun 12, 2020
Image credits: Flickr.com

Storage is one of the most basic requirement for the successful operation of the application. Usually application has multiple types of data to store. For ex. data files, static content like images and videos, OS and libraries files, database storage etc. Storage services from GCP provides multiple services to cater to these varying data storage needs. Choosing the right service with other techniques can help in achieving low latency, high durability, high availability and low cost.

In the last blog I shared my views on how to Choose the right Compute service on Google Cloud Platform. In this blog I will try to share insights on how to select the right storage service for various data storage needs. This blog is written keeping Google Cloud Platform in mind, but the concepts can be applied to any cloud.

There are majorly 3 network services that GCP offers. Disks which is a block storage, Filestore for network file storage and Cloud storage which is an object storage service. These are like the basic building blocks for any managed services, which are built on top of these.

Disks (Block Storage)

In simple terms these are the hard disks. It is used to store operating systems, libraries and other user files. Every VM that you provision needs to have at least one disk for booting. There are 2 types of disks

  1. Local SSD: This is an SSD hard disk physically attached to the VM. It’s ephemeral in nature, which means the data that you write to this disk will be wiped out upon shutdown of the VM. Since it is connected physically to the compute instance and being SSD in nature, it provides the highest IOPS and lowest latency as compared to other block storage options. Another point to note here is, since it is ephemeral, you can not use it as a boot disk. They are designed to be used for temporary storage. Also the content of the disk can not be shared with other VM instances. Local SSDs are of fixed size 375GB and you can attach upto 8 such disks per instance.
  2. Persistent Disk: This is a network attached disk service which means the disk is not physically connected to the compute instance but it is connected through the network. As the name suggests they are persistent so the content of the disk persists even after shutting down the VM. These disks are used as boot disk as well as general purpose storage. You can choose the type of disk — magnetic (HDD) or SSD, based on the IOPS and throughput expected for your application. In GCP, IOPS and throughput increases as the storage size increases. GCP allows attaching upto 64TB of persistent disk per instance.

By default the data stored in disks is encrypted. If the need be, the user can provide their own encryption key as well. This is an important requirement for many business standards and this is supported out of the box by GCP.

Cloud Filestore (Network File Storage)

It’s a fully managed and no ops service. The concept of Cloud Filestore is similar to Network Attached Storage(NAS). It creates a shared storage for your multiple compute instances or Kubernetes engine instance, so all instances can read and write to the same storage. Also it provides an interface similar to filesystem. Many newbies tend to get confused between Persistent Disks and Filestore because both of them enable attaching storage media to compute instance through network, but there is a clear distinction between them that Persistent Disks permits red-only access to shared storage where as Filestore gives Read-write access.

Cloud Storage (Object Storage)

It’s a fully managed service from GCP which lets you store your data as objects. You can access these objects using HTTP calls. It does not provide file system interface. Various objects are grouped using a single namespace called bucket. A bucket can have multiple objects in it. And every object has to belong to only one bucket. Cloud storage provides some of the really cool features out of the box, like fine grained permission (object level and bucket level), object versioning, automatic redundancy in multiple zones/regions and object life cycle management. This makes it an ideal storage system for use cases involving content delivery, storage for data lakes and backup and archival.

To optimize the cost and performance, Cloud storage provides a mechanism of segregating the storage into various storage classes and define the Object Lifecycle Management(OLM)

Storage class

Storage classes define the availability of the data and the pricing of storage. There are 4 types of storage classes.

  1. Standard: It is high performance, highly durable, highly available storage class. It should be used for frequently accessed data.
  2. Nearline: It’s a low cost and highly durable storage for storing infrequently accessed data. This class could be utilized in scenarios where slightly lower availability, 30 day minimum storage duration, data accessing cost is acceptable for the lower cost. GCP recommends to use this class for data which is accessed less than once a month.
  3. Coldline: It’s a very low cost and highly durable storage service for storing data which is accessed even lesser frequently. The availability of cold line storage is similar to that of Nearline storage but it enforces 90 day minimum storage duration. Even this storage class imposes data access charges. So one has to calculate the pricing for their usage and take a call on storage class
  4. Archive: This is the least expensive and highly durable storage class, mainly used for data archiving, backups and disaster recovery. The important distinguishing factor between GCP and other cloud providers is that in GCP, even this type of storage returns data immediately when accessed. It enforces 365 minimum storage duration and usually a best choice for data that you access less than once a year.

Depending upon the type of data that you intend to store, you can select the storage class after doing a bit of math for price estimation.

Life Cycle Management Policies

For few types of data, it is a common practice to set Time To Live(TTL), backup and archive them, maintain non-current versions etc. Cloud Storage provides clean interface to manage these business rules using Object Lifecycle Management (OLM) policies. These rules are applied on buckets and acts on existing as well as future objects. Examples of rules that you can configure are

  • Change the storage class for objects older than 6 months
  • Delete the objects older than 5 years
  • Store only n versions of objects etc.

Final Words:

GCP provides multiple services for storage of data. Developers have to choose the most appropriate ones for their use case. Usually developers choose multiple storage services for different types of data.

In the next blog I’m going to throw insights on how to Choose the right database service for your application on GCP

--

--

Nirav Kothari
GDGCloudMumbai

#Developer #SolutionArchitect #NLP #ML #DataMining #IoT #Automation #GoogleCloud. Actively managing @GDG_Cloud_Mumbai