Open Data Architectures: Driving Customer Choice and Innovation

Published in

Data, Analytics & AI with Dremio

6 min readMay 28, 2024

In today’s rapidly evolving data landscape, the openness and flexibility of a company’s data platform have become critical. Organizations want to make choices based on their business needs without material switching costs. They want vendors to compete for business on customer centric terms. They want their teams to move quickly, focus on their data needs and leverage the large ecosystem of solutions available to accelerate their data analytics and AI. As an example, in the span of the last 18 months, we have seen GenAI change dramatically in terms of tooling, services and capabilities as well as the strategies of customers to leverage LLMs with their data. Using an open data architecture provides customers the ability to adapt to this changing landscape.

Open Data Architectures (ODA) enable businesses to harness robust ecosystems, reduce costs, and retain the power to innovate without vendor constraints.

What does Open mean?

Open is a word that gets thrown around a lot. Sadly, a lot of vendors are saying we are “open,” but you have to dig in and look at real world customer experiences to see what that really means in practice. Certainly one meaning of open is “open source” — but that thinking is narrow in our opinion. While every vendor will choose what portion of their solution will be closed vs. open source, the customer should be eyes wide open on what level of lock-in they are willing to accept.

Working customer back, what open needs to mean to be customer-centric is:

Being able to move between vendors without having to incur massive switching costs
Access to sufficient source code needed to walk away
Choice so customers can interoperate on their data with separate solutions that meet their needs with respect to functionality and budget
Vibrant ecosystem of solutions ultimately to give the customer what they desire. No one vendor will solve all of a customer’s data needs, especially given the changing landscape as highlighted earlier on AI.

The Issue with Proprietary Systems

Historically, the data landscape has been dominated by proprietary systems. In the early days, enterprise data warehouses (EDWs) such as Oracle and Teradata reigned supreme. These systems, while powerful, were closed and proprietary, locking customers into specific proprietary data formats and systems. Lock-in came in many forms, including proprietary SQL, massive stored procedures (what is now known as “data apps” on platforms), and the integration tools that used proprietary interfaces. This lock-in made it difficult and costly for organizations to switch vendors or integrate new technologies, as they were tied to their EDW providers. EDWs also created an “all or nothing” approach, requiring you to only use the EDWs giving all the ETL was written to its destination.

As the data landscape evolved, the advent of cloud data warehouses promised a new era of flexibility and scalability. However, many of these solutions continued the trend of vendor lock-in. They offered easier-to-use interfaces and better scalability, but still required data to be moved into their proprietary systems. This movement often came with significant costs, and only a fraction of an organization’s data was sent to these cloud data warehouses due to the expense, leading to incomplete data access and continued reliance on proprietary systems.

The Rise of the Data Lakehouse and Open Table Formats

Open table formats are rapidly becoming the standard for data and analytics. As such, some technology companies tried to position their table formats as “open” to drive customer adoption, but essentially lock them into their format with proprietary features. Customers have caught on to this approach and adopted the table format that is being contributed to by a growing and thriving ecosystem, Apache Iceberg.

Apache Iceberg’s flexibility and robustness allow for seamless integration with various analytics engines, offering a significant advantage over proprietary formats.

We are excited to see cloud data warehouses start to adopt Apache Iceberg to drive a more open approach to data access and management. We believe the industry is moving in this direction, ultimately benefiting the organizations that can take advantage.

What Will Technology Company’s Do Next to Try to Lock-in Customers?

If we look at the components of a traditional data warehouse, data and table formats are the lowest-level foundational elements. Organizations can no longer lock-in customers with data file formats due to Apache Parquet or table formats, due to Apache Iceberg. We expect many of them to now move up to the next layer, the Metadata Catalog. These technology companies don’t have to control the customer’s data if they can control their metadata. This is an area for enterprise architects and data leaders to watch as their technologies evolve. We believe that many technology organizations will try to lock-in customers through proprietary metadata catalogs — namely, you must go through a particular catalog to simply access the data from the ecosystem.

Dremio’s Commitment to Openness

At Dremio, our commitment to openness is deeply ingrained in our DNA. We believe that customer outcomes are best served by providing flexibility and choice. This belief led to the launch of Project Nessie, an open metadata catalog designed to empower users with the freedom to choose their tools and platforms. By fostering an open ecosystem, we ensure that organizations are not confined by proprietary solutions.

In line with our commitment to openness, Dremio recently announced support for the Apache Iceberg REST catalog specification — namely the implementation of a standard interface specification so that customers can swap catalogs if they so choose. This support is a significant step towards ensuring that customers can adopt open table formats without the risk of lock-in via proprietary metadata catalogs. By supporting this open specification, we provide customers with the assurance that their data and metadata remain accessible and interoperable, regardless of the tools they choose to use.

With Apache Parquet, Apache Iceberg, and the open data catalog we created, Project Nessie, it’s your storage, your data, and your metadata. This means you have the freedom to select the best solutions that fit your needs without being locked into a single vendor’s ecosystem. Technology organizations must compete for your business through value and ROI for your business, not lock-in or contracts.

Ensuring Freedom and Flexibility

Our support for the Apache Iceberg REST catalog specification is designed to make it easier for organizations to embrace open data architectures. This move promotes interoperability and prevents the creation of new silos, allowing customers to switch between different tools and platforms seamlessly. The freedom to choose and adapt is crucial for fostering innovation and ensuring that businesses can respond quickly to changing needs.

This freedom is the reason data lakes were created in the first place!

We look forward to the continued evolution of the data landscape, but hopefully one that is customer centric.