1 hour migrations #2 : Adapting your highly available DB2 primary-standby deployment to work in GCP

Ron Pantofaro
Google Cloud - Community
3 min readAug 23, 2018

Also known as : HA DB2 with TSA, HADR, and ACR, or in English:

  • HA: Highly available (or high availability)
  • TSA: Tivoli System Automation
  • HADR: High Availability Disaster Recovery (By IBM)
  • ACR: Automatic client reroute

Who will find this useful

Whoever is thinking of deploying a DB2 instance with high availability on GCP.

The Challenge

So let’s say you have a highly available , HADR configured, DB2 instances on-premises, that leverage TSA for failover purposes. And you want to move them to the cloud. Easy! or is it? Here are a couple of things to think about:

  1. TSA can use three ways to determine the “tiebreaker” in the case of even number of servers, i.e which instance should now be the master. A storage tiebreaker can become problematic in the cloud because all instances require read-write access. IBM also supplies a cloud tiebreaker, which does not support GCP(and is also limited to 2 instances, to this date). That leaves us with the networking tiebreaker as the remaining option.
  2. In the cloud, virtual IP addresses are virtually…meaningless. If you configured your TSA to pull a virtual IP when a failover occurs, that might work great on-premises but the cloud networking infrastructure moves in mysterious ways and will not accommodate that.

Solution?

This diagram depicts an example solution architecture on GCP:

  1. Configure your TSA for networking tie breaking, since cloud and storage tiebreakers are not applicable.
  2. Cloud alternatives to virtual IP: You can think of plugging in interesting scripts to HADR, scripts that will invoke the GCP api when a failover happens , instead of pulling the virtual IP. Perhaps you can use alias IP and move it to a the primary instance? Sounds good on paper but there is no guarantee it will work in certain disaster cases (which is exactly what we are trying to overcome here). Perhaps some way to control a load balancer? This opens an entire range of unpredictable behaviors , race conditions and an open question as to how to signal to the load balancer that the instance is ready (i.e master) vs not ready (i.e slave). Also, and most importantly , as far as I know — IBM might not take responsibility to the behavior and problems you might encounter.
  3. Luckily, IBM offers ACR (automatic client rerouting). That means that DB2 instances can “tell” clients about the other instances in the cluster and the clients can connect to other instances automatically when a failover happens. This is a configuration on the DB2 servers.
  4. We can go beyond ACR DB2 server side settings. Why? let’s imagine the following scenario. All your clients know how to connect to the primary server address, and indeed they also receive the address of the standby server upon connecting. If your primary fails over to the standby and then you spin up a new client. This client is likely to be configured to connect to the primary address. In that case the client will be unable to connect since that address is unreachable. You can configure the client to be aware of the servers and it will not need to get the addresses from the primary. (Check an example : https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.apdv.cli.doc/doc/c0056196.html )

Conclusion

I hope this will be useful to professionals who manage DB2 deployments and thinking of moving to the cloud (specifically GCP). As always, consider your options, requirements and constraints to understand what solution might work for you and your organization. I plan to have a more detailed, official solution documentation on cloud.google.com/solutions in the near future.

--

--

Ron Pantofaro
Google Cloud - Community

Solutions Architect, Google Cloud (my opinions are my own). Food, distributed systems, coffee, containers, music, devops, travel, data pipelines, fatherhood.