IBM APIConnect v10 | DC and DR Approaches

The DC DR approach for any application is taken based on RTO and RPO

RTO: Recovery Time Objective : Time that it is acceptable for a system to be unavailable during a disaster.

RPO: Recovery Point Objective : DR solutions are usually based on some sort of data copy or backup, it is possible that a system might be recovered to a state prior to the disaster occurring, rather than the state it was in at the actual instant of the disaster. The RPO measures how far back in time the recovery point will be, and therefore how much data might be lost. An RPO of zero would assert that no data will be lost, but such a solution is often a compromise against the cost and performance of the system

For APIC v10, there are several options for Two-Datacenter DC-DR. This topic covers the top 3 approaches. Each has their pros and cons and decision can be taken on careful evaluation of each.

There are four components in APIC v10 (Mgmt, Portal, Gateway, Analytics)

Gateway is the key runtime. its important to ensure it has the lowest RTO/RPO for business continuity

Portal is another component used by Consumers. However, a downtime for this component for few hours doesn’t usually impact Business.

Management and Analytics are for internal use. downtime of few hours shouldn’t have any major Business Impact.

let’s understand some general DC-DR Terminologies:

Primary/Active: the site/system processing LIVE traffic

Passive/Warm Standby: the site/system that continuously synchronises data from Active system. unless manually enforced this system will NOT process LIVE traffic

Hot Standby: the site/system that continuously synchronises data from Active system. if routed through network it will process LIVE traffic.

Cold Standby: the Site/System that is in same software version as Active. hwoever doesn’t have latest data. Mostly in Powered off state.

Note: Below approaches only talks about gateway/Portal and Management. Analytics is omitted as its complex and mostly unnecessary to have a DR for it. if required Analytics data from each DataCenter can be offloaded remotely to external ELK/Splunk/Syslog destinations.

Approach 1: DC Active | DR Gwy Hot standby, Mgmt/Portal Cold Standby

Approach 2: DC Active | DR Cold Standby

Approach 3: DC Active | DR Gwy Hot Standby, Mgmt/Portal Warm Standby

Approach 3 is not available in CP4I at the time of this writing.

Recovery from DR to DC: RTO

for Approach 1 and Approach 2, its same similar steps as DC to DR.

for Approach 3, its quite complicated. below are the high level steps

  • DNS changes of Mgmt and Portal Endpoints
  • Disable network between DC and DR(in case of unexpected failover)
  • Manual failover “apicup” commands execution for Mgmt followed by Portal
  • Once DC is active, enable the network between DC and DR.

Disclaimer: This article just introduces the reader to few approaches of DC DR. Each need to be analysed further in below references for more details and steps for implementing same.

References:

--

--