Storage in Cloud
--
Storage in the cloud is Object Storage. It’s not the same as file storage, in which you manage your data as a hierarchy of folders. It’s not the same as block storage, in which your operating system manages your data as chunks of disk. Instead, object storage means you save to your storage here, you keep this arbitrary bunch of bytes I give you and the storage lets you address it with a unique key. Often these unique keys are in the form of URL’s which means object storage interacts nicely with web technologies. It’s a fully managed scalable service that means that you don’t need to provision capacity ahead of time. Just make objects and the service stores them with high durability and high availability. You can use Cloud Storage for lots of things serving website content, storing data for archival and disaster recovery, or distributing large data objects to your end users via Direct Download.
Cloud Storage is comprised of buckets you create and configure and use to hold your storage objects. The storage objects are immutable, which means that you do not edit them in place but instead you create new versions. Cloud Storage always encrypts your data on the server side before it is written to disk and you don’t pay extra for that. Also by default, data-in-transit is encrypted using HTTPS. Cloud Storage is not a file system because each of your objects in Cloud Storage has a URL.
Whether it’s GCP or AWS, you can store a large amount of data which you need daily or once in a week or month or year, whether your data is very sensitive or less sensitive, cloud service provider provides various type of storage option to let you store data.
In terms of access, data are of two types:
- Frequently Accessed Data
- Infrequently Accessed Data
Frequently Accessed Data as the name suggest data which you accessed frequently maybe daily or every morning or thrice a week or other while Infrequently Accessed Data is that which you accessed once a month or twice in the quarter.
All cloud service provider offers you a range of storage classes designed for different use cases. So let’s see storage for 2 major cloud provider that is Google Cloud Platform and Amazon Web Services.
- Storage Classes for Frequently Accessed Objects
This type of data needs high durability, availability, low latency, and good performance to get accessed frequently. Use cases like cloud applications, dynamic websites, content distribution, mobile and gaming applications, and Big Data analytics they actually need data very frequently and with high availability so the user can do their stuff properly and smoothly.
For this class,
GCP: Multi-Regional, Regional
Regional storage lets you store your data in a specific GCP region. US Central one, Europe West one or Asia East one. It’s cheaper than multi-regional storage but it offers less redundancy like when you need to support high-frequency analytics workload. People use regional storage to store data close to their Compute Engine, Virtual Machines, or their Kubernetes engine clusters. That gives better performance for data-intensive computations.
Multi-regional storage, on the other hand, cost a bit more but it’s Geo-redundant. That means you pick a broad geographical location like the United States, the European Union, or Asia and cloud storage stores your data in at least two geographic locations separated by at least 160 kilometres. Multi-regional storage is appropriate for storing frequently accessed data, for example, website content, interactive workloads, or data that’s part of mobile and gaining applications.
AWS: Amazon S3 Standard, Reduced Redundancy
Amazon S3 Standard offers high durability, availability, and performance object storage for frequently accessed data. Because it delivers low latency and high throughput, S3 Standard is perfect for a wide variety of use cases. It is the default storage class. S3 Lifecycle management offers configurable policies to automatically migrate objects to the most appropriate storage class. If you don’t specify the storage class when you upload an object, Amazon S3 assigns the Standard storage class. Data is stored across multiple Availability Zones so data will be available if AZ gets destructed anyway.
The Reduced Redundancy Storage (RRS) storage class is designed for noncritical, reproducible data that can be stored with less redundancy than the Standard storage class. For durability, RRS objects have an average annual expected loss of 0.01% of objects. If an RRS object is lost, when requests are made to that object, Amazon S3 returns a 405 error.
Both services are backed up Service Level Agreement, click here for GCP and AWS.
- Storage Classes for Infrequently Accessed Objects(IAO)
Now Infrequently Accessed Objects can of two types:
- Data which you need once in a month.
- Data which you need once or twice in a year means after one month ie Archival Data.
These storage classes are designed for long-lived and infrequently accessed data. Both GCP and AWS charge a retrieval fee for these objects, so they are most suitable for infrequently accessed data. These data need high durability, availability, low latency, and good performance but required rapid access when needed. You need this type of storage class for storing backups, for older data, for copies of data etc that is accessed infrequently, but that still requires millisecond accessed.
GCP and AWS offer different service for both Objects.
- Data which you need once in a month.
GCP: Nearline
Nearline storage is a low cost, highly durable service for storing infrequently accessed data. The storage class is a better choice than multi-regional storage or regional storage in scenarios where you plan to read or modify your data once a month or less on average. It simplifies lifecycle management through automated archival and deletion. For example, if you want to continuously add files to cloud storage and plan to access those files once a month for analysis, nearline storage is a great choice.
AWS: Standard and OneZone IA
Standard IA stores the object data redundantly across multiple geographically separated Availability Zones (similar to Standard storage class). Standard IA objects are resilient to the loss of an Availability Zone. This storage class offers greater availability, durability, and resiliency than the OneZone IA class.
OneZone IA Amazon S3 stores the object data in only one Availability Zone, which makes it less expensive than Standard IA. It’s a good choice, for example, for storing secondary backup copies of on-premises data or easily re-creatable data. However, the data is not resilient to the physical loss of the Availability Zone resulting from disasters, such as earthquakes and floods. The OneZone IA storage class is as durable as Standard IA, but it is less available and less resilient.
Both services are backed up Service Level Agreement, click here for GCP and AWS. You can send data from this class to Archival Class after a certain period of time but setting the days limit.
2. Data which you need multiple times throughout the year.
In this class, objects are not available for real-time access. You must first restore archived objects before you can access them. You can’t choose this class at a time of storage class creation, however, you first create the object of Nearline (GCP) or Standard/OneZone IA and upload your file there and then you transfer your file to this class after a certain period and you can’t access them directly, you have to retrieve it. Both GCP and AWS retrieval fee for these objects.
GCP: Coldline
Coldline storage is a very low cost, highly durable service for data archiving, online backup, and disaster recovery. Coldline storage is the best choice for data that you plan to access at most once a year. This it’s due to its slightly lower availability, 90-day minimum storage duration, costs for data access, and higher per operation costs. For example, if you want to archive data or have access to it in case of a disaster recovery event. Your data stored in Coldline is available to you in sub-second average response times, rather than in hours or days.
AWS: Glacier
Amazon Glacier is a secure, durable, and extremely low-cost storage service for data archiving. You can reliably store any amount of data at costs that are competitive with or cheaper than on-premises solutions. To keep costs low yet suitable for varying retrieval needs, Amazon Glacier provides three options for access to archives, from a few minutes to several hours. Amazon Glacier supports S3 Lifecycle Policies for automatic migration between S3 & Amazon Glacier storage classes.
Only Coldline are backed up by Service Level Agreement, click here for GCP.
This is all about Storage in Cloud.
Thanks
Happy Learning!