Is BigQuery Omni the next revolution in Data Warehousing?

Introduced by Google in September 2020

Janaka Ekanayake
Nov 23, 2020 · 4 min read
Image for post
Image for post
Figure 1- Copyright Free Image from Pixabay.com

IDC predicts that global dataspace will grow up to 175 zettabytes (10 to power 21) by 2025. Even my driving license is connected to the internet using RFID, and therefore continuously generates data.

According to a cloud adoption survey by Gartner, 81% of companies that are using public clouds are using more than one cloud provider. In simple terms, multi-cloud management is becoming more important than ever. However, the separated data needs to be implemented in a centralized data warehouse, because it’s impractical to have multiple data warehouses inside a single company. To address this, last September in GoogleCloudNext 2020, Emily Rapp, product manager of Google BigQuery, announced the next state-of-the-art data warehousing solution “BigQuery Omni”, especially for people who are using multi-cloud vendors.

BigQuery is not a new term for us. Google introduced BigQuery in late 2011 to handle massive amounts of data, such as log data of thousands of retail systems, or IoT data from millions of IoT devices across the globe. It’s a fully-managed and serverless data warehouse which shifts focus to analytics instead of managing infrastructure.

Breaking the silos

BigQuery is designed to manage data silo problems that happen when a company has individual teams, each with its own independent data marts. By integrating BigQuery with the Google Cloud Platform, a company can easily handle the data version control problems mentioned above. But, with increasing demand for multiple cloud vendors to be used inside a single company, BigQuery Omni came into the picture. This became reality because of ‘Anthos’, another new technology introduced by Google which enables users to run applications not just using Google Cloud, but also with other cloud vendors, such as Amazon Web Services (AWS) and Microsoft Azure.

Image for post
Image for post
Figure 2 - Image from https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-omni

As for the data silo question, the main challenge was that there was no method to compute data held on another cloud platform. But with BigQuery Omni, we can run our compute clusters (known as Dremel) on Anthos clusters in AWS or Azure. It has a secure connection because the control panel and the metadata can remain on Google Cloud and only the query results pass through the BigQuery routers. That connection can also be used when users choose to bring the results back. Users can decide either to bring them back, or to do everything within AWS.

The competitive landscape

So far, BigQuery has two major competitors: AWS Redshift and Snowflake. As of 18th May 2020, the BigQuery market share is less than AWS Redshift, but its growth rate is quite impressive.

Image for post
Image for post
Figure 3 - Sources for their individual adoption: RedShift, BigQuery, Snowflake

But, with this new release of BigQuery Omni, the serverless data warehousing technique is going to blow up the current market. Currently, BigQuery Omni has no competition, and neither Redshift nor Snowflake supports multi-cloud vendor integration so far. So, this really is going to be a wake-up call for BigQuery competitors.

Practical application

So, what does BigQuery Omni ‘look like’ at work? Here’s one example — have you ever had the experience where you buy something, and you still see the ad repeatedly? You have already bought it, right? So, there’s no point in wasting ads on the existing customer. Using BigQuery Omni, we can solve the issue, because we can tie that commerce data to the ad platform safely and securely to ensure that once a purchase has been made, the ad no longer appears.

Pros and cons

As with everything, there are advantages of using BigQuery/BigQuery Omni.

  • Low-level access to BigQuery Omni users;
  • Simplicity — because of its truly serverless architecture, there is very little that you have to do to manage your BigQuery setup. (You basically just run your queries and pay according to what you scan);
  • Scalability — you can scale up to 100TB queries very easily without scaling any infrastructure or anything else;
  • Breaking down of silos and gain insights into data;
  • A consistent data experience across the clouds — it doesn’t matter where your data sets are, you should be able to use standard SQL to write your queries in the BigQuery interface; and
  • Portability, powered by Anthos.
  • A relatively high pricing structure — to use BigQuery Omni, you have to use Anthos at the same time, so you pay for both services; and
  • Google BigQuery needs knowledge of SQL coding to leverage its data analysis capabilities.

Summery

Companies using public clouds from more than one cloud provider need a centralized data warehouse to hold their data. BigQuery Omni is making a splash in the market by providing secure, serverless data warehousing, along with a host of other benefits.

Happy coding!

Learn More

CloudZone

CloudZone’s Expert-Led Blogriter

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store