Data Warehouse Training — Episode 3 — Data Warehouse VS Database

Data Science Earth
Data Science Earth
Published in
7 min readMar 18, 2021

Database VS Data Warehouse: Key Diffences

What is Database?

A database is a collection of related data which represents some elements of the real world. It is designed to be built and populated with data for a specific task. It is also a building block of your data solution.

What is a Data Warehouse?

A data warehouse is an information system which stores historical and commutative data from single or multiple sources. It is designed to analyze, report, integrate transaction data from different sources.

Data Warehouse eases the analysis and reporting process of an organization. It is also a single version of truth for the organization for decision making and forecasting process.

KEY DIFFERENCE

  • Database is a collection of related data that represents some elements of the real world whereas Data warehouse is an information system that stores historical and commutative data from single or multiple sources.
  • Database is designed to record data whereas the Data warehouse is designed to analyze data.
  • Database is application-oriented-collection of data whereas Data Warehouse is the subject-oriented collection of data.
  • Database uses Online Transactional Processing (OLTP) whereas Data warehouse uses Online Analytical Processing (OLAP).
  • Database tables and joins are complicated because they are normalized whereas Data Warehouse tables and joins are easy because they are denormalized.
  • ER modeling techniques are used for designing Database whereas data modeling techniques are used for designing Data Warehouse.

Why use a Database?

Here, are prime reasons for using Database system:

  • It offers the security of data and its access
  • A database offers a variety of techniques to store and retrieve data.
  • Database act as an efficient handler to balance the requirement of multiple applications using the same data
  • A DBMS offers integrity constraints to get a high level of protection to prevent access to prohibited data.
  • A database allows you to access concurrent data in such a way that only a single user can access the same data at a time.

Why Use Data Warehouse?

Here, are Important reasons for using Data Warehouse:

  • Data warehouse helps business users to access critical data from some sources all in one place.
  • It provides consistent information on various cross-functional activities
  • Helps you to integrate many sources of data to reduce stress on the production system.
  • Data warehouse helps you to reduce TAT (total turnaround time) for analysis and reporting.
  • Data warehouse helps users to access critical data from different sources in a single place so, it saves user’s time of retrieving data information from multiple sources. You can also access data from the cloud easily.
  • Data warehouse allows you to stores a large amount of historical data to analyze different periods and trends to make future predictions.
  • Enhances the value of operational business applications and customer relationship management systems
  • Separates analytics processing from transactional databases, improving the performance of both systems
  • Stakeholders and users may be overestimating the quality of data in the source systems. Data warehouse provides more accurate reports.

Characteristics of Database

  • Offers security and removes redundancy
  • Allow multiple views of the data
  • Database system follows the ACID compliance ( Atomicity, Consistency, Isolation, and Durability).
  • Allows insulation between programs and data
  • Sharing of data and multiuser transaction processing
  • Relational Database support multi-user environment

Characteristics of Data Warehouse

  • A data warehouse is subject oriented as it offers information related to theme instead of companies’ ongoing operations.
  • The data also needs to be stored in the Datawarehouse in common and unanimously acceptable manner.
  • The time horizon for the data warehouse is relatively extensive compared with other operational systems.
  • A data warehouse is non-volatile which means the previous data is not erased when new information is entered in it.

Data Warehouse vs. Database

Let’s dive into the main differences between data warehouses and databases.

Processing Types: OLAP vs OLTP

The most significant difference between databases and data warehouses is how they process data.

Databases use OnLine Transactional Processing (OLTP) to delete, insert, replace, and update large numbers of short online transactions quickly. This type of processing immediately responds to user requests, and so is used to process the day-to-day operations of a business in real-time. For example, if a user wants to reserve a hotel room using an online booking form, the process is executed with OLTP.

Data warehouses use OnLine Analytical Processing (OLAP) to analyze massive volumes of data rapidly. This process gives analysts the power to look at your data from different points of view. For example, even though your database records sales data for every minute of every day, you may just want to know the total amount sold each day. To do this, you need to collect and sum the sales data together for each day. OLAP is specifically designed to do this and using it for data warehousing 1000x faster than if you used OLTP to perform the same calculation.

Optimization

A database is optimized to update (add, modify, or delete) data with maximum speed and efficiency. Response times from databases need to be extremely quick for efficient transaction processing. The most important aspect of a database is that it records the write operation in the system; a company won’t be in business very long if its database didn’t make a record of every purchase!

Data warehouses are optimized to rapidly execute a low number of complex queries on large multi-dimensional datasets.

Data Structure

The data in databases are normalized. The goal of normalization is to reduce and even eliminate data redundancy, i.e., storing the same piece of data more than once. This reduction of duplicate data leads to increased consistency and, thus, more accurate data as the database stores it in only one place.

Normalizing data splits it into many different tables. Each table represents a separate entity of the data. For example, a database recording BOOK SALES may have three tables to denote BOOK information, the SUBJECT covered in the book, and the PUBLISHER.

Normalizing data ensures the database takes up minimal disk space and so it is memory efficient. However, it is not query efficient. Querying a normalized database can be slow and cumbersome. Since businesses want to perform complex queries on the data in their data warehouse, that data is often denormalized and contains repeated data for easier access.

Data Analysis

Databases usually just process transactions, but it is also possible to perform data analysis with them. However, in-depth exploration is challenging for both the user and computer due to the normalized data structure and the large number of table joins you need to perform. It requires a skilled developer or analyst to create and execute complex queries on a DataBase Management System (DBSM), which takes up a lot of time and computing resources. Moreover, the analysis does not go deep — the best you can get is a one-time static report as databases just give a snapshot of data at a specific time.

Data warehouses are designed to perform complex analytical queries on large multi-dimensional datasets in a straightforward manner. There is no need to learn advanced theory or how to use sophisticated DBMS software. Not only is the analysis simpler to perform, but the results are much more useful; you can dive deep and see how your data changes over time, rather than the snapshot that databases provide.

Data Timeline

Databases process the day-to-day transactions for one aspect of the business. Therefore, they typically contain current, rather than historical data about one business process.

Data warehouses are used for analytical purposes and business reporting. Data warehouses typically store historical data by integrating copies of transaction data from disparate sources. Data warehouses can also use real-time data feeds for reports that use the most current, integrated information.

Concurrent Users

Databases support thousands of concurrent users because they are updated in real-time to reflect the business’s transactions. Thus, many users need to interact with the database simultaneously without affecting its performance.

However, only one user can modify a piece of data at a time- it would be disastrous if two users overwrote the same information in different ways at the same time!

In kontrast, data warehouses support a limitet number of concurrent users. A data warehouse is separede from front-end applications, and using it involve eritin and execution complex queries. These queries are execution epense, and so only a sal number of people can use the system simultaneously.

ACID

Database transactions usually are exceed in an ACID (Atomik, Consistent, Islata, and Durabile) coplan banner. This çömelince ensures that data changes in a reliable and high-integrity way. Therefore, it can be tröste ehven in the evet of eros or Powers faillerse. Since the database is a record of business transactions, it must record each one with the utmuşta integrity.

Since data warehouses foncusu on reddin, rather than motifin, historical data from many different sources, ACID çömelince is leş leş enforced. However, the top cloud providers like Redshift and Panoply do ensure that their queries are ACID compliant where possible. For instance, this is always the case when using MySQL and PostgreSQL.

Database vs. Data Warehouse SLA’s

Most SLAs for databases state that they must meet 99.99% uptime because any system failure could result in lost revenue and lawsuits.

SLAs for some really large data warehouses often have downtime built in to accommodate periodic uploads of new data. This is less common for modern data warehousing.

Database Use Cases

Databases process the day-to-day transactions in an organization. Some examples of database applications include:

  • An ecommerce website creating an order for a product it has sold
  • An airline using an online booking system
  • A hospital registering a patient
  • A bank adding an
  • ATM withdrawal transaction to an account

Data Warehouse Use Cases

Data warehouses provide high-level reporting and analysis that empower businesses to make more informed business. Use cases include:

  • Segmenting customers into different groups based on their past purchases to provide them with more tailored content
  • Predicting customer churn using the last ten years of sales data
  • Creating demand and sales forecasts to decide which areas to focus on next quarter

Alperen Kezay

--

--