Data Storage: To Cloud or Not to Cloud?

Learn the core components required for on-premise data storage systems and whether storing data via a cloud provider is the way forward.

Manchester D&A
Slalom Data & AI
7 min readJan 10, 2023

--

Photo by Christina Morillo from Pexels

By Faisal Momoniat

‘Data storage is the cornerstone of the data engineering lifecycle.’ - Joe Reis and Matt Housley in O’Reilly’s Fundamentals of Data Engineering

Data is stored at many stages of the data engineering lifecycle. It must continue in a medium of storage until systems within the lifecycle are ready to push it forward for further processing and transformation.

Organisations at a global level have been storing 60% of their corporate data on the cloud with Western Europe accounting for 21% of the global cloud market share. While the transition to cloud models for larger scale organisations hasn’t happened yet, a predicted 100 zettabytes of data will be stored in the cloud by 2025.

There’s been a lot of talk about what the ideal storage solution is and whether using the cloud or on premise is more sustainable for the economy and the growing data sphere. Factors such as cost of storage, ease of retrieval of data from the type of data storage and policies, and compliance standards all influence which data storage methodology is best. These are some of the challenges we face today as expansion of the data sphere is inevitable.

Data engineering lifecycle

To really understand the concept and ideas behind data storage, it’s important to study the raw components that make up storage systems.

As a data professional, you’ll encounter several different data storage systems within data architecture. Let’s have a look at the different types of data storage systems and their make up:

1. Single machine vs. distributed storage

Data storage and data access patterns have become more complex due to the structural nature of the data and its intricacy. Due to the increased number of users globally — which increases the amount of data produced — outgrowing the need to store on one single server.

Data storage systems need to be highly available for end users and disaster proof, making it a necessity to shift data to multiple servers. This phenomenon is known as distributed storage and can process data both faster and on a larger scale. Cloud data warehouses rely on distributed storage architectures.

2. File storage

Data professionals deal with files on a daily timescale. A file is defined as ‘an entity with specific read, write and reference characteristics used by software and operating systems.’

3. Block storage

Block storage is the type of raw storage, used by solid state drives (SSDs), magnetic disks, and in the form of virtual machines (VMs). Unlike file storage, block storage allows for control at a granular level of size, scalability, and data durability.

4. Object storage

Object storage as defined by the name allows for storage of objects of different shapes and sizes. An object, in the case of data storage is a ‘file like construct’ which can be in the form of .txt, CSV, JSON, images, videos and audio (mostly unstructured or semi structured data types).

Data storage systems which are not on premise are now on the cloud. Many organisations want to make the transition to the cloud but are having second thoughts due to lack of infrastructure and knowledgeable personnel to carry out the process of data migration. Here, we will cover some of the advantages and disadvantages of storing data on the cloud.

Advantages of cloud storage

1. Accessibility

The data which is stored can be accessed from anywhere globally with a stable internet connection.

2. Operational cost savings

Cloud storage for businesses come at a relatively reduced or ‘no cost’ as annual OPEX costs are reduced heavily with no upfront CAPEX costs needed to store data remotely.

3. Unlimited storage

The cloud offers unlimited capacity to store data. The business pays for the capacity and performance it requires with no purchasing of hardware being needed.

4. Scalability and speed

On the cloud, businesses only pay for what is required and so when the business grows, the corresponding growth is accommodated for in the form of scalable storage. The speed at which data can be retrieved or pushed is much quicker than backing up onto a physical disk. This is true for multi geo-located organisations.

5. Disaster recovery

An emergency backup plan is always required when storing data on the cloud. A second location or cross region replication of data is one of the top things an organisation should think about before adhering to a cloud storage plan for their data.

Example of cloud data storage architecture: AWS Simple Storage Service (S3), an object storage service on AWS, amazon web services offering scalability, high data availability and security with optimal performance.

As organisations have been switching their cloud storage from on premise to the cloud, concerns have been rising around their limitations. Here are some cons surrounding data storage on the cloud:

Disadvantages of cloud storage

1. Data centres are costly to run

They also require millions of kilowatt hours everyday to operate. Significant resource is spent on semiconductors, backup servers, cooling systems and industrial air conditioning. At the time of writing, storage and usage of cloud technology has a larger footprint than the airline industry with a single data centre consuming the same amount of electricity used by 50,000 homes.

2. Security, compliance, and privacy

Security within the cloud is a very controversial topic as concerns around valuable data being stored remotely. Sensitive business information going to a cloud vendor can be risky, choosing a reliable provider such as AWS, Google, Microsoft.

3. Data management

Managing data in the cloud is a complicated task, integration of existing on-premise solutions to new cloud solutions or a ‘lift and shift’ solution is not always the answer. The rise of hybrid cloud data storage models is evidence of the risks imposed with shifting an on-premise data storage system entirely to a cloud vendor.

4. Lifetime costs

As the data stored in the cloud increases, so does the cost. If the applications of your business or operating model is local and the data stored is in the cloud, this can add to unwanted networking costs. The cloud storage model does look appealing, but it is entirely dependent on the case at hand.

Whilst global temperatures increase, the worry surrounding the carbon footprint of data centres has been emphasised. Let’s have a look at some of the changes being made to reduce the carbon footprint of cloud storage:

  1. The construction of hyperscale data centres that pledge to shift to carbon neutral sites via carbon offsetting and investing in solar and wind power.
  2. Relocation of data centres to cooler climates — such as Sweden and Iceland — to reduce costs associated with central processing unit (CPU) cooling and industrial air conditioning. However, this is reducing the latency to end users.
  3. ASEAN countries such as Singapore, Malaysia, and Indonesia continue to drive market growth in data centres with lower costs of entry. Large swathes of land are used to produce renewable energy plant systems coupled with data centres. A global digital infrastructure investor called Equinix invested 144 million USD into a fifth data centre which aims to be ‘green’ through the addition of sustainability policies and incentives.
  4. The introduction of hydrotreated vegetable oil (HVO) fuel to power data centres. HVO fuel is a fossil-free, renewable fuel that can be used as a replacement for diesel and offers up to a 90% reduction in greenhouse gas emissions, eliminating CO2 and NOx. It also has an extended storage life of up to 10 years.
  5. Microsoft has implemented a cooling technique via an underwater data centre, meaning the data centres are net neutral.

Embracing the future of cloud storage

As with any new endeavour, trying to appeal to a mass market that’s still on premise and full of legacy applications will feel risky. The advantages associated with cloud-driven storage outweigh the disadvantages ‘at a cost,’ such as security and privacy concerns, paying extra for extra storage once space runs out and always needing a stable internet connection. With 88% of all cloud data breaches reported to be via human error, the fear of security and cloud breaches is stopping organisations from making the full leap to cloud data storage. Organisations wishing to make the transition to cloud storage need to upskill their data and cloud engineering teams with correct cloud storage knowledge.

Despite seamless integration with other tools and software, the transition to cloud storage is still new. With the possibility of a data storage crisis looming — and a market growing more technologically competitive each day — it’s become increasingly important for organisations to consider cloud storage solutions.

Slalom is a global consulting firm that helps people and organisations dream bigger, move faster, and build better tomorrows for all. Learn more and reach out today

--

--

Manchester D&A
Slalom Data & AI

Insights and fresh perspectives on knowledge and the latest trends in Data and Analytics from the Slalom Manchester D&A team