OLTP vs OLAP: An Introduction to Database Systems

Stefano Meloccaro
5 min readJun 26, 2024

--

A relational database is a system for storing and managing data organized into tables. Within this framework, there are two primary types of database systems OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing); each of them is optimized for different types of workloads and purposes.

OLTP vs OLAP: differences and use cases

OLTP systems focus on handling a large number of short, atomic transactions efficiently, ensuring data integrity and quick response times. A typical example is when placing, modifying, or deleting an order on an e-commerce platform. In this case, the challenge is not handling the amount of data involved in each transaction (as they are typically very small) but ensuring that the response time for each transaction is almost in real-time, considering the potentially large number of transactions occurring simultaneously worldwide. OLAP systems, on the other hand, focus on providing fast query performance for complex analytical queries on large datasets, often involving historical data. This environment is ideal for analysts where the main challenge is the opposite of OLTP: a smaller number of transactions but involving high volumes of data from which to extract insights and influence company decision-makers.

Normalization

But what are the main factors that differentiate the two? Returning to the e-commerce example, let’s say we want to track which books we sold, when we sold them, and to whom. One way to accomplish this task could be to store all this data in a single table, as shown in the image below.

However, the repeated values (or redundancy) might create several issues, such as increased needs for storage and associated costs, reduced query speed, potential inconsistencies (which occur when, for example, a value is mistakenly updated in one instance but not in all instances) and many others. To avoid them, we can use a technique called “database normalization”, which is specifically designed for eliminating redundant data. By storing the data in multiple tables indeed, after to have added the opportune “keys” where necessary to let the tables communicate to each other (which in this case coincide with BookID), we can efficiently eliminate the redundant data (as shown in the image below) and avoid the problems mentioned above.

This process can be repeated multiple times until the database is fully normalized, which means the data cannot be further split into different tables (continuing the normalization process indeed, we could add two additional tables “Country” and “Sales” where to store the respective types of data having so 4 tables instead of 2).

Even though the full process of database normalization is out of the scope of this article, it is important to understand that the more we go through the process of normalization of our database, the more tables will be created; and while this structure works very well for writing data into our OLTP database (that’s why OLTP databases are usually fully normalized), it does not work as well when we want to read data from it for analysis purposes. This is because analysis often involves large amounts of data taken from different tables, and the more tables we have, the more joins we need to perform in our scripts, which are computationally expensive. So if the number of joins is the main problem that slow down our queries, what’s the best way to solve it?

Denormalization

A very efficient way to solve this problem is to use “Star” or “Snowflake” data modelling techniques, which incorporate a certain degree of “denormalization”. Denormalization is indeed the reverse process of normalization, which essentially involves combining tables together to reduce the number of joins required for queries, thereby improving read performance. Star and snowflake data modelling are very important concepts which will discussed in a dedicated article, however for now, it is important to understand that using techniques involving denormalization are also crucial, in fact, despite some redundancy, these techniques bring significant query performance improvements, which are particularly useful in OLAP systems.

OLTP and OLAP as part of a bigger ecosystem

So, which one to choose? The use of an OLTP or an OLAP system really depends on the business requirements; however, many businesses such as e-commerce or banks for example, due to their nature, need both of them. That’s because on one hand, they need to process customer transactions in real time (OLTP system), while on the other, they need to analyze large amounts of data for analytical purposes (OLAP system). One of the best ways to achieve this result is by using an ETL process (Extract, Transform, Load), which involves extracting data from one environment (such as an OLTP database), transforming it for analysis purposes, and then loading it into an OLAP database. This way, we ensure to have the best of both systems.

Thanks for being a part of our community!

If you found this article helpful and would like to show your support, don’t hesitate to:

  1. Clap on this story
  2. Leave a comment below telling me what you think. This will help me with the next articles
  3. Highlight the parts in this story you were most interested on
  4. Support my work on Buy Me a Coffee ☕️

These actions really really help me out, and are much appreciated!

Follow me for more insights on LinkedIn | YouTube | Blog

--

--

Stefano Meloccaro

Ex-Amazon BI Engineer. Writing about data analytics, tech trends, and business intelligence. Follow for insights and industry advice. 🚀