oyelade dayo frank
Google Cloud - Community
2 min readFeb 2, 2019

--

BigQuery for Data Warehouse: PART ONE

From the world of AWS Redshift.

After 12 months of using redshift as a central data store for a project, I discovered that there are many performance issues related with it coupled with the fact that the cost is not friendly for the project(for startup) I was working on at the time. Some of the issues are :

  1. Slow queries
  2. Slow dashboards
  3. Incompatibility with new POSTGRESQL features
  4. Expensive nodes
  5. Sometimes routine auto maintenance breaks existing queries.

Discouraged by the limitation Redshift presents, I had to break the ego of using AWS tools for everything. Then Bigquery:

What is BigQuery?

BigQuery is Google’s Data warehouse Solution. It is super fast, highly scalable, cost-effective, and fully managed cloud data warehouse. Also known for high performance and scalable query engine on the cloud. The strength of BigQuery lies in its ability to handle large datasets. For example, querying millions of records might take only a few seconds to get the result.

It’s powered by Colossus [Google distributed file-system], each query is transformed into an execution tree by Dremel (Google query engine), the data is retrieved from Colossus and aggregated. Everything runs on Google Jupiter high-speed network.

Why use BigQuery?

A. Cost

They have 2 pricing tier

  1. Pay for Usage Only for small and medium scale customers. Loading, previewing and exporting data are free of charge. Google charge you only for data storage, querying and streaming inserts. Based on Dataset examined to get the result when a query is issued measured in Bytes, BigQuery Charge you. Currently, the first terabyte per month is free(excellent for startups).
  2. Flat Fee for Large Scale customers: BigQuery charges a fixed amount for a dedicated reservation of slots.

Considered BigQuery cheaper compared with other data warehouse solutions. Also, BigQuery provides cost control option to allow you to cap your daily costs.

B. Performance

BigQuery is super fast and works well with any data size. Because there is unlimited computing resource capacity you can easily and seamlessly scale your data warehouse from small data to big data (Petabytes) without having to buy more capacity. Query Processing speed is great. It’s good for real-time data analytics because it has high-speed streaming insertion API.

C. Maintenance

BigQuery is a serverless data warehouse solution enabling you to focus on working your products(data analysis e.t.c) correctly and quickly, rather than spending time managing infrastructure(operating and sizing computing resources). Backups and routine maintenance are handled by BigQuery.

Conclusion?

Warehousing and Analyzing massive datasets can be time-consuming and quite expensive without the right infrastructure. BigQuery is an enterprise data warehouse that solves this problem by enabling super-fast SQL queries using the processing power of Google’s infrastructure.

Next: We would look at the main components of BigQuery and How to Setup your Data Warehouse.

--

--