“Cross-Cloud” is the new norm in Data Engineering and Analytics

Amazon poked the bear and both Google and Microsoft followed

Jesus Templado González
ROMPANTE
3 min readMar 4, 2024

--

Intro

Extract, transform, and load (ETL) is the process of combining data from multiple sources into a large, central repository.

In 2023 Amazon Web Services (AWS) announced their plans to move away from traditional ETL integration methods, launching a campaign for “Zero ETL” during the re:Invent” user conference.

The Big Cloud providers are moving in a similar direction. Pic by Jay Chapel

“Zero ETL” in the context of Cross-Cloud

Zero ETL is about simplifying data integration and engineering through integrated services and tools. This method allows data to be stored, processed in its original format, or directly analysed within the source platform using SQL, without transformations or transfers.

In a more technical jargon: With the Zero ETL approach, AWS facilitates connections between Amazon Aurora and Redshift, offering near real-time query execution and the application of AI workloads using the built-in capabilities of Amazon Redshift, such as ML, materialised views, data sharing, access to multiple data stores and data lakes, as well as integrations with Amazon SageMaker, Amazon QuickSight, and other services within AWS. The following diagram illustrates this and you can learn more about it here.

The response from Google & Microsoft

Amazon’s strategy set a standard in the industry and now Microsoft and Google have their own offering around this.

Google has taken significant steps in a similar direction by introducing Google BigLake and BigQuery Omni:

Diagram provided by Google
  • BigQuery Omni is an analytics solution that provides a compute engine that runs locally to the storage on Google Cloud but also on AWS or Azure. Users can query this environment, access it and analyse data without leaving BigQuery’s UI. Similarly, using standard SQL and the same BigQuery APIs, users are able to break down “data silos” and work from just one system and interface.

Microsoft provides users with similar capabilities in Fabric, an analytics platform integrating data movement, data science, real-time analytics, and BI. Fabric eliminates the need for multiple vendor services as it offers an integrated suite that includes data lake, data engineering, and data integration. Built on a SaaS foundation, it simplifies needs through a unified environment that combines useful elements from Power BI, Azure Synapse, and Azure Data Factory.

The Cross-Cloud in the data engineering world

Multi-Cloud is an environment that uses multiple cloud service providers (CSPs). On the other hand, Cross-Cloud in this same context refers to an application or workload that uses various CSPs and transfers data seamlessly across those environments.

The seamless operational consistency of applications across different CSPs is a big advancement for data architects and engineers that rely on infrastructure, management, security, and access system that are all harmonised. And there is more to it. Expenses and complexity can decrease by eliminating the requirement of moving data across clouds. Secondly, it also reduces the complexity of managing too many pipelines, which in turns avoids data duplication when moving or updating data sets that require frequent refreshes/updates. However, the difficulty of querying data directly relies in incorporating cleansing and transformation processes in between.

Conclusion

With the launch of the Zero ETL approach by Amazon, the positive trend towards Cross-Cloud keeps gaining momentum, supported by the industry leaders Google and Microsoft.

In my view, this indicates a shift towards new data engineering methodologies and architectures, particularly for those bigger organisations that rely on or need multiple cloud platforms to streamline data analytics services.

--

--

Jesus Templado González
ROMPANTE

I advise companies on how to leverage DataTech solutions (Rompante.eu) and I write easy-to-digest articles on Data Science & AI and its business applications