Optimizing Oracle Performance: Leveraging Hardware Capabilities and Huge Pages for Enhanced Efficiency

Constantinus Satrio
Traveloka Engineering Blog
9 min readJul 3, 2023
Circuit board layout (Origin: Unsplash)

Editor’s Note: In the quest for optimizing database performance, leveraging hardware capabilities and implementing efficient memory management techniques are crucial.

This post replays the successful journey of improving the Oracle E-Business Suite system in Traveloka by utilizing HugePages and optimizing hardware configurations.

By reducing page table overhead, enhancing memory access speed, and minimizing TLB misses, significant performance enhancements were achieved, resulting in faster response times and increased productivity for users.

Constantinus Satrio is a Database Administrator with the Corporate Technology team, whose domains of expertise are on PL/SQL, Python, Java, C++, and JavaScript.

Introduction

As the database administrator for our Oracle E-Business Suite, I am going to share a bit about the recent significant and successful performance improvement in Production, as a result of a patch to the Oracle GL module and memory tuning implementation using HugePages. Thanks to the efforts of our managers, functional teams, and engineers.

We have been facing — and working very hard to address — performance issues of the Oracle E-Business Suite system related to slow response times and intermittent downtime that have caused frustration and inconvenience to our users, who expect to rely on the system to carry out their day-to-day tasks efficiently.

The root cause of the problem was a combination of factors, including undersized hardware, and unoptimized configurations. In addition, the volume of data that the system had been handling had increased exponentially over time, straining the system and exacerbating the performance issues.

To address those challenges, we embarked on a comprehensive performance improvement plan that involved upgrading our hardware, optimizing our configurations, and fine-tuning our software settings. We also worked closely with our functional teams to identify areas for improvement, and collaborated with subject matter experts to leverage the latest technologies and best practices.

By addressing these issues head-on, we improved the overall user experience and ensured a streamlined system. Our users have also noticed a significant improvement in system speed and reliability.

We were able to capture a baseline measurement of the execution time (in seconds) of a large program (Gather Schema Statistics) before and after implementing the HugePages. The improvement of execution time reduction was significant; up to 46% resulting in faster system response times and increased productivity, allowing the business to operate more efficiently.

Performance improvement figures

Moving forward, we will continue to monitor our system’s performance and make any necessary adjustments to ensure that it keeps running smoothly. Efficiently translating our knowledge into effective problem-solving that delivers solutions is one of our key focuses.

How We Improved Performance

Imagine you want to cover a large room with tiles. You can either use small tiles or big tiles to cover the floor. Using small tiles would require more individual pieces to cover the same area, while using big tiles would require fewer, but larger pieces. Similarly, the use of huge pages in the Linux operating system can help to optimize system performance by reducing the number of individual pages that the system needs to manage.

In traditional Linux systems, the default page size is 4k, which means that the system needs to manage a large number of individual pages to keep track of the memory used by applications. By using huge pages of 2048k, the system can reduce the number of individual pages it needs to manage, which can lead to improved performance.

Small tile vs big tile

Huge memory pages are a feature of modern CPUs that allow the use of much larger page sizes for virtual memory. Standard memory pages are typically 4KB, whereas huge memory pages can be 2MB or even 1GB. Huge memory pages can reduce the number of page table entries required for a given amount of memory, which can improve performance in some workloads.

Using huge pages in Linux can be compared to using big tiles to cover a floor. By reducing the number of individual pieces that need to be managed, the system can operate more efficiently and provide better performance for applications that require large memory footprints, such as a database.

Using huge memory pages can benefit a database in four ways:

  • Reduced Page Table Overhead: Using huge memory pages can reduce the overhead of page tables, which are used by the operating system to manage virtual memory. This is because the larger pages mean that there are fewer page table entries needed to map a given amount of physical memory.
  • Increased Memory Access Speed: When a database needs to access a particular page in memory, it can do so more quickly if that page is stored in a huge memory page rather than multiple smaller pages. This is because the database can access the entire page in a single memory access operation, rather than needing to access multiple smaller pages and performing additional memory lookups.
  • Reduced TLB Misses: The Translation Lookaside Buffer (TLB) is a hardware component that caches recently used memory mappings, so that they can be quickly re-accessed in the future. When a database uses huge memory pages, it can reduce the number of TLB misses, because fewer page table entries need to be stored in the TLB.
  • Improved Database Performance: Overall, using huge memory pages can lead to improved database performance, particularly for large databases with a high degree of memory usage. By reducing page table overhead, improving memory access speed, and reducing TLB misses, the database can operate more efficiently and respond more quickly to user queries and other operations.

To calculate number of page table entries: Total number of page table entries = Total memory size / Page size

Assuming we have 70,000 MB of SGA (System Global Area), and a page size of 4KB (default size), the number of page table entries required would be 18,310.528.

Using a huge page size of 2048 KB, on the other hand, the number of page table entries required would only be 35,840, for a delta of 18,274,688 (18310528–35840).

By using huge pages of 2048 KB, we can save approximately 18.3 million page table entries. And by reducing the size of page table entries, we could increase the chance of these entries getting cached in the TLB. Details provided in the hardware section.

How Linux Handles Memory Pages

Now, you may start to wonder why all of these information matters in the cloud era. Knowing how linux manages memory is important because AWS EC2 instances are run on top of custom Linux KVM (Kernel Virtual Machine).

Linux, like many modern operating systems, uses a multi-level page table system to manage memory. The purpose of the page table is to keep track of which pages of memory are in use and where they are located in physical memory. When a program tries to access memory, the processor uses the page table to translate the virtual memory address used by the program into a physical memory address.

A multi-level page table system is used to efficiently manage large amounts of memory. The page table is divided into multiple levels, with each level pointing to the next level of the page table. The first level is called the Page Global Directory (PGD), and each subsequent level is called a Page Middle Directory (PMD) or Page Table (PT).

Visualization of multi level page tables

When a program tries to access memory, the processor first looks in the TLB (Translation Lookaside Buffer), which is a cache of recently used page table entries. If the entry is found in the TLB, the translation is done quickly and the program can access the memory. Otherwise, the processor must use the PT to translate the virtual address into a physical address.

To do this, the processor starts with the PGD and uses the virtual memory address to index into the PGD to find the PMD or PT. It then uses the PMD or PT to index into the next level of the page table, and so on, until it finds the final entry that contains the physical address of the memory.

This multi-level page table system allows for efficient management of large amounts of memory, as each level of the page table only needs to be loaded into memory when it is needed. The TLB helps to speed up the translation process by caching frequently used page table entries.

A Bit About Hardware

In Production, at the time of writing this blog, we are using a z1d EC2 instance with an Intel Xeon 8151 processor.

We can see from the figure below that the Intel Xeon 8151 processor’s memory subsystem’s TLB cache is quite generous. However, the ratio of the cache compared to the size of main memory pages that it manages, is very small.

Intel Xeon 8151 processor memory subsystem (Origin: Wikipedia)

Cache associativity refers to the relationship between cache blocks and cache sets. In a direct-mapped cache, each block of memory can only be stored in one specific location in the cache, which is determined by its address. In a fully associative cache, each block of memory can be stored in any location in the cache. In a set-associative cache, each block of memory can be stored in a limited set of locations within the cache.

Effective size, on the other hand, refers to the usable size of the cache. It takes into account both the physical size of the cache and the associativity. For example, a 256 KB fully associative cache would have an effective size of 256 KB, while a 256 KB set-associative cache with 4-way associativity would have an effective size of 64KB (since each set can only hold 64KB of data). If the number of pages exceeds the size of cache, then a TLB miss is likely to occur.

When a CPU needs to translate a virtual memory address into a physical memory address, it looks for the translation in the TLB. If the translation is not found in the TLB, it is called a TLB miss, and the CPU must perform a more expensive page table walk to find the translation. Signs of TLB misses can include increased memory access latencies, decreased performance of memory-bound applications, and increased CPU usage due to the extra work required to perform the page table walk.

Citing from an Intel document

Citation from Intel document regarding ITLBS and pipeline stall

When the TLB (Translation Lookaside Buffer) frequently misses, it can lead to increased ITLB pipeline stalls. This means that the processor is waiting for instructions to be fetched from memory, which can significantly slow down the system. We initially suspected that the issue was due to disk I/O, but further investigation revealed that the high rate of context switching and ITLB pipeline stalls were contributing to the performance degradation.

Closing Remarks

In conclusion, successful improvement of the Oracle E-Business Suite system’s performance can be attributed to thorough consideration of hardware capabilities and the strategic implementation of huge pages in the software setup. By recognizing the intrinsic relationship between hardware and memory management, we were able to optimize the system for peak performance.

Implementing huge pages in the Linux operating system played a crucial role in optimizing memory management. This technology significantly reduced page table overhead, increased memory access speed, and minimized TLB misses. The combination of hardware and software optimizations resulted in enhanced database performance, faster system response times, and increased productivity.

The system now operates with heightened efficiency, resulting in streamlined operations, faster response times, and increased productivity for our users.

--

--