Different table types in Apache Hudi #Datalake #ApacheHudi #COW #MOR

Sivabalan Narayanan
6 min readOct 3, 2021

--

Apache hudi offers different table types that users can choose from, depending on their needs and latency requirements.

There are two types of tables:

  • Copy On Write (COW)
  • Merge on Read (MOR)

Terminologies:

Before we dive into COW and MOR, let’s walk through some of the terminologies used in hudi for better understanding of the following sections.

Data files/Base files:

Hudi stores data in columnar parquet format and is called as data files/base files. This is known to be very performant and is widely used across the industry. Data files and base files are interchangeably used in general, but both mean the same.

Delta log files:

In MOR table format, updates are sent to delta log files, which is stored in avro format. These delta log files are always associated with a base file. Let’s say you have a data file called data_file_1, any updates to records in data_file_1, will be added as a new delta log file. And hudi will do real time merge of records in base file and its corresponding delta log files while serving read queries.

File Groups:

In general, there could be a lot of data files depending on how much data is stored. Each data file and its corresponding delta log files form one file group. In case of COW, its a lot simpler as there is only base files.

File versions:

Let me explain in the context of COW. Whenever an update happens to a data file, a newer version of the data file will be created which contains merged records from older data file and newer incoming records.

File slices:

For every file groups, there could be different file versions. So, a file slice comprises of a particular version of data file along with its delta log files. For COW, latest file slices refers to latest data/base files for all file groups. For MOR, latest file slice refers to latest data/base files and its associated delta log files for all file groups.

With these context, let’s look at COW and MOR table types.

Copy on Write table:

As the name suggests, every new batch of write to hudi will result in creating a new copy of respective data files and a newer version will be created along with records from the incoming batch. Let me illustrate with an example.

Let’s say we have 3 file groups with data files as below.

We are getting a new batch of write and after indexing, we find that these have records matching file group1 and file group2 and then there are new inserts for which we are going to create a new file group (file group4).

So, both data_file1 and data_file2 will have newer versions created. Data file 1 V2 is nothing but contents of data file 1 V1, merged with records from incoming batch matching records in data file 1.

COW does incur some write latency due to the merge cost happening during write. But, the beauty of COW is its simplicity especially in terms of operationalizing. There are no other table services (like compaction) required and relatively easier to debug as well.

Merge on Read table:

Again as the name suggests, merge cost is moved to the reader side from writer side. Hence during write, we don’t do any merge or creation of newer data file versions. Once tagging/indexing is complete, for existing data files that have records to be updated, hudi creates delta log files and names them appropriately so they all belong to one file group.

And the reader will do real time merging of base files and its respective delta log files. As you would have guessed it by now, we can’t let this prolong forever. So, hudi has a compaction mechanism with which the data files and log files are merged together and a newer version of data file is created.

User can choose to run compaction either inline or in an asynchronous mode. There are different compaction strategies to choose from. Most commonly used one is, based on the number of commits. For eg, you can configure to have max delta logs for compaction as 4. This means that, after 4 delta logs have been created for a data file, it will be compacted and a newer data file will be created. Once compaction completes, reader just have to read the latest data file and does not care about older files(file versions).

Let’s compare COW vs MOR on certain important criteria.

Write latency:

As we discussed earlier, COW has higher write latency when compared to MOR due to the synchronous merge happening during write.

Read latency:

Since we have to do real time merge in MOR, MOR tends to have higher read latency when compared to COW. But if you have configured appropriate compaction strategies based on your needs, MOR could play out well.

Update Cost:

I/O cost is going to be higher for COW since we create newer data files for every batch of write. MOR has very minimal I/O cost since updates go into delta log files.

Write Amplification:

Again, going to be higher with COW as we create newer versions of data files. Let’s say you have a data file of 100Mb in size, and you do 4 batches of writes by updating 10% of records each time. After 4 writes, hudi will have 5 data files each of size 100Mb with COW. You can configure your cleaner (will be discussed in a later blog) to be aggressive to clean up older file versions, but if your cleaner hasn’t kicked in, this will end up with 5 version of data files on storage. This is not the case with MOR. Since updates go into log files, write amplification is kept to minimum. For the same above example, let’s say compaction hasn’t kicked in yet, after 4 writes, we will have 1x100Mb file and 4 delta log files ~ 10Mb in size ~= 140Mb.

Conclusion

Even though MOR might seem to have some downsides, it offers different querying capabilities like read optimized query(will be discussed in a later blog) which may not incur the additional merge cost. And if you have an async compaction job with appropriate configs, you could reap all the benefits of MOR w/o trading off much on your latency.

Hope this blog was useful in understanding different table types in Hudi. You can find more info on our official Apache Hudi website and this page especially talks about file layouts and table types as well.

--

--