Patching and Upgrading Oracle Applications Made Easy with Tessell

Published in

Tessell DBaaS

7 min read4 days ago

As enterprises migrate their Oracle applications to the cloud, ensuring high availability (HA), patching, and seamless infrastructure upgrades for the underlying databases is crucial to maintaining business continuity and performance. Tessell’s fully managed DBaaS platform provides robust tools and methodologies to manage these needs effectively. This article explores the techniques and best practices for seamless patching and infrastructure upgrades for Oracle applications in the cloud, covering the provisioning of multi-AZ HA systems, the switchover process, failover scenarios, and additional use cases.

Multi-AZ Highly-Available System

Topology

A multi-availability zone (multi-AZ) high availability (HA) system is essential for maintaining the uptime and reliability of Oracle databases in the cloud. The core components of this topology include:

Database Nodes: These nodes consist of a primary and one or more standby databases configured using Oracle DataGuard. The primary database actively handles the application’s traffic, processes transactions, and serves queries. The standby database(s) in a different availability zone are synchronized replicas that remain ready to take over operations if the primary database fails seamlessly. This setup ensures that data is continuously protected and available, reducing the risk of data loss and downtime.
Observer Nodes: Observer nodes play a crucial role in monitoring the health of the primary database. These nodes run Oracle DataGuard Broker, which automatically manages failover operations. If the observer detects that the primary database is unavailable or compromised, it initiates a failover to the standby database, promoting it to the new primary role. This automated monitoring and failover mechanism significantly enhances the resilience of the overall system.

Maximum Availability

Oracle DataGuard offers several protection modes to cater to different business requirements, with Max Availability Mode being the most robust and preferred for mission-critical applications. In this mode:

Transactions are committed to the primary and standby databases before the application acknowledges them. This ensures no data loss, even in the case of a primary database failure. The synchronous replication mechanism used in Max Availability Mode guarantees that the standby database is always up-to-date with the primary database.
While this mode might introduce a slight latency due to synchronous replication, it balances data protection and performance, making it ideal for applications where data integrity is paramount.

Provisioning Process

A successful provisioning of an HA database system follows the steps below.

Creating the Primary Database Instance on Node 1.
Creating the Standby Instance (Node 2) from the Primary Instance using the RMAN Duplicate command.
Configure Data Guard with Node 1 as the primary instance and add Node 2 as the standby instance.
Enable Data Guard Protection Mode as MAXAVAILABILITY and enable Fast-Start Fail-over.
Run a Data Guard observer process on all three nodes, designating the observer on Node 3 as the Master Observer and calling it the Controller Node.

Role Reversals

Switchover

A switchover is a planned operation that reverses the roles of the primary and standby databases without any data loss. This process is typically used for maintenance activities, testing, or disaster recovery drills. The switchover process involves several steps:

Prepare the Standby Database: Ensure that the standby database is fully synchronized with the primary database. This involves checking the lag and ensuring all transactions have been applied to the standby database.
Execute Switchover Command: Initiate the switchover operation from the primary database. This is done using Oracle DataGuard commands, which ensure that the current primary database transitions to a standby role and the standby database becomes the new primary.
Verify Role Reversal: After the switchover operation, it is essential to verify that the roles have been successfully reversed. This involves checking the status of both databases to confirm that the standby database is now the primary and vice versa.

Failover

A failover is an unplanned operation triggered by an unexpected outage or failure of the primary database. Various failover scenarios include:

Fast-Start Failover: This is an automatic failover mechanism initiated by the observer node if the primary database becomes unavailable. The observer continuously monitors the primary database’s health and, upon detecting a failure, automatically promotes the standby database to the primary role. This ensures minimal downtime and maintains data integrity without manual intervention.
Manual Failover: When automatic failover is not configured or desired, a database administrator can manually initiate a failover operation. This is typically done after verifying the primary database’s unavailability and ensuring the standby database is ready to take over.

The standby database assumes the primary role in both cases, ensuring the application remains available with minimal disruption.

Examples

Let’s review a few failover scenarios in the same topology depicted in the image (Figure 1) earlier.

Scenario 1: Node 1 Failure

If Node 1 (Primary) goes down, the Master Observer on Node 3 initiates an auto failover.
Node 2 becomes the new Primary Instance.
When Node 1 recovers, it synchronizes with the Primary (Node 2) and acts as a Failover Replica. Fast-Start Failover is automatically enabled.

Scenario 2: Node 2 Failure

If Node 2 (Standby) goes down, Node 1 continues as the Primary.
Data Guard reports Fast-Start Fail-over as DISABLED with errors.
Upon Node 2’s recovery, it syncs with Node 1 and continues as a Fail-over Replica with Fast-Start Fail-over automatically enabled.

Scenario 3: Node 3 Failure

In the event of Node 3 (Controller) failure, one of the observers on Node 1 or Node 2 assumes the role of Master Observer.

Note: Users can also perform a manual switchover from the Tessell UI.

Patching

Near-Zero Downtime Patching

Rolling patching allows organizations to apply updates to their Oracle databases with near-zero downtime by leveraging the switchover process. This method ensures that the database remains available during the patching operation. The rolling patching process involves several steps:

Prepare/Patch Standby Database: Apply the patch to the standby database. This involves updating its software and ensuring it is fully synchronized with the primary database before the patching process starts.
Switchover: Once the standby database is patched and ready, initiate a switchover to make it the new primary. This ensures the application runs on the patched database while the original primary database is taken offline for patching.
Patch Former Primary: Apply the patch to the new standby database (the former primary). This step involves updating the database software and verifying that the patch is applied successfully.

Standby-First Patching

Minimal Downtime: The rolling patching process ensures that the application remains available during the patching operation, minimizing disruption to business operations.
Reduced Risk: By patching the standby database first, you can verify the patch’s stability and functionality before applying it to the primary database. If any issues arise during the patching process, you can quickly failover to the unpatched primary database, reducing the risk of prolonged downtime.
Operational Flexibility: Rolling patching allows for shorter and less disruptive maintenance windows, providing greater flexibility in scheduling patching activities. This approach is particularly beneficial for organizations with stringent availability requirements.

Infrastructure Upgrade

Seamless Compute Shape Change

Switchover mechanisms can also facilitate infrastructure upgrades, such as changing the compute shape of your database instances. This allows the databases to move from an older generation infrastructure to a newer generation. The seamless upgrade process involves several steps:

Upgrade the Standby Node: Change the compute shape of the standby database. This would require rebooting the instance. Ensure that the standby is synchronized with the primary before moving to the next step.
Switchover to Standby: Initiate a switchover to make the standby database the primary database. This ensures the application remains available while the old primary database is offline for the upgrade.
Upgrade Former Primary: Change the compute shape of the old primary database. This involves selecting a new compute shape that offers better performance, more resources, or other desired characteristics. The upgrade process may include resizing CPU, memory, or storage resources to meet new requirements.

This approach ensures that the database remains available throughout the upgrade process, minimizing downtime and disruption to business operations.

Additional Use Cases

The same principles can be applied to several other lifecycle management processes.

Resizing

Resize Standby: Move the standby to a new compute shape with a higher or lower CPU and memory as needed. This would require rebooting the instance. Ensure the standby is synchronized with the primary before moving to the next step.
Switchover to Standby: Initiate a switchover to make the standby database the primary database. This ensures the application remains available while the old primary database is resized.
Resize Former Primary: Adjust the old primary database’s resources (CPU, memory, storage) to meet new performance or capacity requirements. This step involves selecting the appropriate resource configuration and applying the changes.

Parameter Updates

Apply Updates to Standby: First, implement the parameter changes on the standby database. This step involves updating database parameters, configuration settings, or other operational characteristics.
Switchover: Initiate a switchover to make the standby database the primary, activating the parameter changes. This ensures that the updated configuration is now in effect.
Update Former Primary: Apply the same parameter changes to the new standby database (the former primary). This ensures consistency and synchronization between the primary and standby databases.

Database Upgrades

Upgrade Standby: First, apply the database upgrade to the standby database. This step involves updating the database software to a new version, applying patches, or making other significant changes.
Switchover: Once the standby database is upgraded and ready, initiate a switchover to make the upgraded standby the primary database. This ensures that the application runs on the upgraded database.
Upgrade Former Primary: Apply the upgrade to the new standby database (the former primary). This ensures that both databases are running the latest version and are synchronized.

Conclusion

Seamless patching and infrastructure upgrades for Oracle applications in the cloud are achievable with a well-architected multi-AZ HA system, leveraging the power of Oracle DataGuard and switchover processes. These strategies ensure minimal downtime, maintain high availability, and provide operational flexibility. By adopting these best practices, organizations can enhance their Oracle cloud environments’ resilience, performance, and scalability, ensuring continuous business operations and improved user experiences.