What is Data WareHouse

Hayrullah Karaman
İstanbul Data Science Academy
5 min readMay 26, 2023

DatawareHouse , datalake ve database arasındaki farklar ve BigQuery

Data WareHouse🎉🎉🎉

As Wikipedia mentions, they are structures with analysis and presentation techniques to present data-driven insights to companies working with Data driven targets.The main purpose of data warehouse systems is not to store data, data obtained from OLTP databases or other sources need to be cleaned, quality, consistent and verified data. For Data Warehouse, I will use DWH as abbreviated in the rest of the article.

Data WareHouse Structure

Since DWHs are more expensive structures than other databases and datalakes, storing the data here will both cause performance turning problems and increase costs, so it is necessary to draw as much as we need here and transfer it to structures such as datalake, which are cheaper when we are done.Due to their structure and purpose, they are designed for data access queries such as Select, rather than daily Insert , update, delete queries in DWHs. For this reason, data in column structure are kept, not in index, row base structure like databases. For example, if you wonder the total sales in a structure such as employees, customers, sales, and salaries, they will only process in the sales column. In classical databases, this operation goes to each row. It will attract the sales of that row. In databases, each department’s separate table can be created for each business unit, but in DHWs this structure should be kept consolidated and, if necessary, in accordance with the answer to a problem, combining 2–3 tables and performing the query on a single table without adding a join to a Select cost. is designed for.Datamarkets are created according to the problems, these datamarkets can only serve a business unit, but can serve a company-wide report.Generally, two types of design schemes are used in DWH systems, these are known as Star and Snowflake.

DWH ,Other Structures Differences

Datalakes can hold data in formats such as structured, semi-structured or unstructured pictures and videos. They are structures such as drive and Onedrive that you use individually, unlike DWHs, only structural data is kept. In operational databases, the most costly structures are the daily operational services. is executed, but the data serving a specific purpose in DWHs must be kept for a certain period of time.

Modern DWH👩‍🍳

DWHs can be designed as infrastructure according to the needs, although they are still not valid today, due to some legal restrictions or other reasons, many companies can use these structures in their local data centers by using their preferred hardware and software. But the problem is, when these companies start to grow or when the need for new data in global structures increases, hardware costs for their own local structures, disk capacity, on the other hand, processor costs increase. The management of these structures is being difficult day by day. Because every day, different weaknesses of a solution emerge, and additional design and management costs such as the representation of these structures are added.

Today, despite these management and structure challenges, Technology giants have started to offer complete services by taking on these challenges themselves and adding tools such as Etl, such as many reporting errors.
These companies, such as Google, Amazon, IBM, offer structures where you can focus only on your business, without any server and operation costs, pay as you go.

BigQuery, which is the Google Cloud product that we will examine in our article, positions it as the Google DWH service.

Google Cloud BigQuery🤏

As we mentioned before, Google takes many responsibilities on itself and offers the DWH system Bigquery to focus only on your business, there is no infrastructure cost or management cost here, only your data and you are alone, you don’t have to wonder if there is a backup of the server crashed😀😀😀

It has all the features that a modern DWH should have, for example, you can access the data with the ETL tool, while cleaning it, or you can present the results of your analysis with visualization tools such as Looker Or you can build a machine learning model and make predictions.

You can create a few Datasets belonging to a Project, and a few tables, views or functions belonging to these datasets.BigQuery supports big data file types such as Avro, Parquet and Ocr because it is designed to support Big data structures.

You can give role-based permissions to these created structures such as datasets or tables, or you can share them in structures such as Json, csv.

It supports BigQuery’s services from data acquisition, transfer, querying,security and presentation as follows

In this article, I talked about DWHs, different systems of other systems and modern DWH features and Bigquery as an example. I hope it was useful. In my next article, I will talk about project-based Dtaset creation, table creation and querying with BigQuery.
Take care of yourselves🖐

--

--