Global Load Balancing and Resilience for MuleSoft Private Spaces in CloudHub 2.0 utilizing Amazon Networking Services

Michael McDonnell
Another Integration Blog
9 min readMay 30, 2023

Co-Author — https://medium.com/@ruebenjimenez

Disaster Recovery is an essential aspect of any organization, as it ensures that critical business operations can resume functionality as fast as possible after unexpected disruptions such as natural disasters, cyber-attacks, or pandemics. Without a comprehensive business continuity plan in place, companies risk facing significant financial losses, reputational damage, and potentially going out of business altogether. Single availability zone outages are handled by CloudHub 1.0 and CloudHub 2.0’s high availability automatically. However, it is important for businesses to plan for connectivity or regional disruptions in AWS as a means to minimize the impact of unexpected events.

Configuring and setting up a DR strategy with MuleSoft is incredibly easy with AWS. In this article, we’ll assume you already have a primary private space set up. You’ll learn how to set up your DR environment, attach a transit gateway, learn when to choose between active/passive and active/active, and how to configure Route 53 as your global load balancing solution.

Diagram 1.0: Complete Architectural Diagram

Private Spaces

MuleSoft’s CloudHub 2.0 provides a virtual and secure private space for running your applications. You can create multiple private spaces in different regions. Your private network can be connected to your private space to function as a single private network. In each private space, you define a virtual cloud where your apps are deployed, one or more connections to your external network, TLS contexts for domain availability and security, firewall rules for traffic control, and the environments and business groups authorized to deploy to the private space. You can learn more about Private Spaces here.

Here are some configuration steps on both MuleSoft and AWS platforms to set-up global load balancing and resilience for MuleSoft Private Spaces in CH2.0 leveraging Amazon Networking Services.

  • Pre-Planning
  • Active/Active vs Active/Passive
  • Choosing a CIDR Block for your DR Environment
  • Transit Gateway (TGW)
  • Route53 Health Check

Pre-Planning

The first step in any major project is preplanning. Preplanning will save on time, cost, and improve the overall quality of your implementation. For a Business Continuity / Disaster Recovery environment, you will need to collect data on both your primary environment and prepare information on your future Disaster Recovery environment

Primary Environment (Things you already have)

  • DNS entries for Public DNS target
  • DNS entries for Private DNS Target
  • Inbound Public Static IPs
  • Outbound Public Static IPs
  • Private Space Internal CIDR block (IP Address Range)
  • Route CIDRs to and from the extended network (AWS, On-Prem, Other Services)

Disaster Recovery Environment (Things you need to collect / generate)

  • DNS entries for Public DNS target
  • DNS entries for Private DNS Target
  • Inbound Public Static IPs
  • Outbound Public Static IPs
  • Private Space Internal CIDR block (IP Address Range)
  • Route CIDRs to and from the extended network (AWS, On-Prem, Other Services)
  • Names of and Deployable archives of all Business Critical Applications

Active/Active vs Active/Passive

In an active/active configuration, both private spaces are active and handling traffic at the same time. Generally speaking, you can get better utilization of resources in normal scenarios, improved scalability, and increased availability but will have to adjust for higher costs and potential complexity based on data availability. Many of the benefits of Active/Active are contingent on being able to scale out your application footprint during an event.

In an active/passive configuration, one private space is active and handling traffic, while the other is hosting applications with no load. If the active private space fails, the passive private space takes over and becomes active. Generally speaking, you will have an overall lower cost of ownership with an active/passive configuration due to the lower complexity and lower core utilization. However, you will have to adjust for potential delays in cutover during an outage, potentially increased costs, and complexity in scaling out the solution as utilization grows.

Choosing a CIDR Block for your Disaster Recovery Space

CIDR selection in a Private Space is important to be done cautiously because it determines the number of workers you can deploy. The smallest size you can deploy today is a /24 which is 254 usable addresses for workers and your ingress. The largest size you can deploy today is a /16 which is 65,534 usable addresses. Once a CIDR block is assigned to a Private Space, it cannot be changed. This means that if you choose too small to begin, you will have to tear down the entire private space and rebuild it (and reconfigure the routes, firewall rules, reattach the transit gateway, etc) before you can have a functional DR Environment.

Configuring Disaster Recovery Private Space

Deploying a private space is relatively straightforward. After determining your CIDR and connection points to your backend — it’s just a matter of following the wizard. You can find more details here.

  • Name your private space. We will name ours DisasterRecoveryPrivateSpace
  • Click Create Private Network
  • Set the CIDR and Region for your Private Network. In our example we’ll be selecting US-West 1 as our DR Region and setting our CIDR to 172.30.0.0/24
  • Get some coffee while we spin up your private space! (About 20–30 minutes)
  • In our example — we are connecting to our data center backbone via the AWS internal network. We will be using a Transit Gateway Attachment to make that happen. Create a Transit Gateway Attachment by clicking Create Connection.
  • Select Transit Gateway and give the attachment a name. In this example we’ll be naming ours tgw-us-w1-attach for Transit Gateway US West 1 Attachment and click next
  • Enter in your static routes (if necessary)— as shown above in Diagram 1.0 — our data center is 10.0.0.0/8 and our cloud space is in the 172.16.0.0/12 subnet.
  • Follow the instructions and click the Create Resource Share link.
  • In the AWS Console fill in your specific Transit Gateway and Mulesoft AWS Account ID. We named ours tgw-ch2-us-w1-rs to designate that it is a resource share of a transit gateway to CloudHub 2.0 in US-West 1. Click Create resource share when everything is filled out.
  • Insert your Resource Share ID and Owner ID
  • Click on the Transit Gateway Attachments link to accept the attachment request and Click Done
  • Select the appropriate Transit Gateway Attachment and select Actions -> Accept transit gateway attachment.
  • Accept the Transit Gateway Attachment by clicking Accept
  • After about 2 minutes you’ll see the accepted transit gateway attachment in your private space details screen
  • Configure CNames for Ingress Controllers in Route 53. Take the Public (or Private!) DNS Target and apply your appropriate CNAME to it. In our case, we will create disaster-recovery.mulesoftplatform.com against our public target

Configure Route53 health checks and fail-over

In this section — we will associate multiple resources with the primary record, the secondary record, or both. In the way we will configure everything; Route 53 will consider the primary fail-over record to be healthy as long as at least one of the associated resources is healthy.

  • Start creating a health check by clicking on Health Check on the left and clicking Create Health Check.
  • In this example we will be using our hello-sapi API in each environment as our health check URL. So we’ll be setting the following:
    Name: globalmulesoftplatform
    What to monitor: Endpoint
    Specify endpoint by: Domain name
    Protocol: HTTPS
    Domain name: primary.mulesoftplatform.com
    Port: 443
    Path: hello-sapi/api/message
  • In our example — configuring SNS notification is not necessary
  • After the health check is created successfully, copy and paste the health check id and set it to the side for moment.
  • Now that we’ve configured the health check we can configure our top-level CNAME global.mulesoftplatform.com that will dynamically fail-over between our primary and disaster-recovery Private Spaces.
  • Click into your hosted zone and click Create Record
  • Record name: global
    Record type: CNAME
    Alias: On
    Route traffic to: Alias to another record in this hosted zone
    Search for: primary.mulesoftplatform.com
    Routing policy: Failover
    Failover record type: Primary
    Health check ID: Use the previous one generated.
    Evaluate target path: Yes
    Record ID: primary
  • Create another record — this time for the Disaster Recovery Private Space.
    Record name: global
    Record type: CNAME
    Alias: On
    Route traffic to: Alias to another record in this hosted zone
    Search for: disaster-recovery.mulesoftplatform.com
    Routing policy: Failover
    Failover record type: Secondary
    Health check ID: This example does not need a secondary health check.
    Evaluate target path: Yes
    Record ID: secondary
  • Finally: Verify Route53 health check configurations and primary/secondary fail-over routing

Conclusion

While walking through this blog, we were able to demonstrate to you how easy it is to set-up business continuity with a strong partnership between Mulesoft and AWS. In case, there was an outage in the future — your business would be prepared to fail over seamlessly from US-East-1 to US-West-1.

By utilizing the MuleSoft Anypoint Platform we are able to quickly spin up a CloudHub 2.0 Private Space for our Disaster Recovery region, connect it via AWS Transit Gateway back to our corporate network, and intelligently load balance across our primary and disaster recovery zones utilizing health checks built into Route53.

Going through this process we were able to pre-plan our deployment making sure we’d have a seamless integration into our backend systems by confirming we did not have overlapping IPs, and were able to choose an Active/Passive fail-over plan by intelligently using the capacity we have with minimal overhead.

Hopefully this has shown you how easy it is to configure a disaster recovery strategy for your own environment.

--

--

Michael McDonnell
Another Integration Blog

Michael McDonnell is a Principal Platform Solutions Engineer at MuleSoft, working with customers to drive innovation in their integrations.