Choosing the Right Data Storage Solution: Maximizing Efficiency and Accessibility

Sebastian Freiman
Blue Orange Digital
3 min readFeb 13, 2024

In our previous article, we explored the intricacies of data movement and the various options available for extracting and transferring data. Now, let’s delve into the next phase of the Modern Data Stack (MDS): data storage solutions. Where should we store the data extracted from source systems? In this article, we will explore the different storage alternatives and discuss their benefits, considering factors such as compliance, security, and business needs.

Cloud Object Storage

Cloud object storage has gained significant popularity, with Amazon Simple Storage Service (S3) leading the way as the first cloud service of its kind. S3 functions as a virtual disk that can read and write “objects.” While it may not accommodate your son’s old toys or your Lord of the Rings DVD collection, it provides an excellent storage option for your data extracts generated by your preferred extraction software.

Other cloud providers offer similar services to S3, each with their own straightforward names. The key benefits of cloud object storage include durability and reliability. Your data is stored across multiple data centers, ensuring high durability and protection against failures. Additionally, cloud object storage provides virtually unlimited scalability, allowing you to meet your growing data storage needs easily.

Databases and Data Warehouses

While cloud storage is ideal for storing files, it may not meet the needs of all organizations when it comes to data retrieval and consumption. This is where databases and data warehouses come into play.

Online Analytical Processing (OLAP) databases are designed for fast data consumption and analytics. Unlike source systems that may take minutes to process a query, OLAP databases excel at performing complex operations effortlessly. They are optimized for read operations, making them ideal for data analysis tasks.

In the case of Amazon S3, it primarily serves as an object storage service rather than a traditional database that allows direct querying. However, there are several AWS services and third-party tools available to perform query-like operations on data stored in S3.

AWS Options for Querying Data in S3

Amazon Athena: Athena is an interactive query service that enables SQL-based analysis of data directly from S3. With Athena, you can define a data schema, write SQL queries, and execute them on the underlying data stored in S3. It supports various file formats such as CSV, JSON, and Parquet.

Amazon Redshift Spectrum: Redshift Spectrum expands the capabilities of Amazon Redshift, allowing you to query and join data stored in S3. It seamlessly combines your existing Redshift data with data stored in S3, providing a unified data analysis experience.

AWS Glue: Glue is a fully managed extract, transform, and load (ETL) service that can discover, catalog, and transform data from various sources, including S3. It enables you to create and run ETL jobs to transform your data into a queryable format, which can then be loaded into other AWS services like Redshift or Athena.

Leveraging Third-Party Tools

In addition to AWS options, there are third-party tools and frameworks you can utilize to query data stored in S3. Apache Spark, Presto, and Hadoop are examples of powerful tools that can process and query data directly from S3, leveraging their distributed computing capabilities.

Conclusion

When it comes to storing data, organizations have a range of options to consider. Cloud object storage provides durability, scalability, and cost-effective storage for files. However, OLAP databases and AWS services like Athena, Redshift Spectrum, and Glue offer powerful querying capabilities for efficient data consumption and analytics. Additionally, third-party tools can be integrated to enhance your data querying and processing capabilities further.

To determine the ideal storage solution for your organization, carefully assess compliance requirements, security considerations, and specific business needs. Consult with our experts at Blue Orange Digital to navigate the complexities of data storage and unlock the full potential of your data infrastructure.

Contact us today to learn more about how Blue Orange Digital can help you optimize your data storage strategy and accelerate your data-driven initiatives.

--

--