Migrate VMs behind Standard Load Balancer to another region with Azure Site Recovery

Akihiro Nishikawa
Microsoft Azure
Published in
7 min readAug 31, 2021

--

[As of August 30, 2021]

The original article in English is here.

Japanese edition is here.

Inquiry from customer

My customer asked me about the following topic.

We have a system which consists of Azure Load Balancer and two VMs behind the load balancer. To meet our rules around BCDR (business continuity & disaster recovery), we would like to migrate this system with Azure Site Recovery (ASR), but the issue of “Site Recovery configuration failed (151196)” happens and prevents us from configuring ASR. What is the root cause? Do you have any workarounds or solutions?

As this inquiry is not clear for me, I asked them to elaborate the condition and issue.

  • They use Standard Load Balancer.
  • ExpressRoute is used to connect between their on-premise environment and Azure, and forced tunneling is enabled.
  • Their application running VMs uses Table storage as a data source. They have already configured Service Endpoint for Table storage.
  • As state is not shared between VMs, simple migration from one VM to another is required.

The following diagram seems to reflect customer’s environment.

VNet connected to ExpressRoute is not Hub network, so integration between ExpressRoute and Site Recovery, which is described in the following URL, is not required in this case.

Cause

If you are familiar with Azure, you would detect the root cause.

Standard Load Balancer prevents VMs behind the load balancer from accessing outside located VNet. So, configuration for accessing ASR related resources outside VNet is required. Indeed forced tunneling is configured, but this configuration does not work behind Standard Load Balancer.

This is mentioned in the document.

If the VMs are behind a Standard internal load balancer, by default, it wouldn’t have access to the Microsoft 365 IPs such as login.microsoftonline.com. Either change it to Basic internal load balancer type or create outbound access as mentioned in the article Configure load balancing and outbound rules in Standard Load Balancer using Azure CLI.

Issue 2: Site Recovery configuration failed (151196)

ASR needs access to Azure Active Directory services such as login.microsoftonline.com, but configuration for accessing such services was not done. Forced tunneling lets you redirect or “force” all Internet-bound traffic back to your on-premises location, and default gateway is advertised from on-premise side. However, forced tunneling does not work for VMs behind Standard Load Balancer.

Outbound connectivity

Outbound connectivity from VMs is listed below. These are required when replicating VMs with Azure Site Recovery.

Storage: *.blob.core.windows.net
Azure Active Directory: login.microsoftonline.com
Replication: *.hypervrecoverymanager.windowsazure.com
Service Bus: *.servicebus.windows.net

Solutions

We have the following options to establish outbound connectivity required for replicating VMs with Azure Site Recovery.

  1. Replace Standard Load Balancer with Basic Load Balancer.
  2. Assign public IPs to VMs behind Standard Load Balancer.
  3. Assign NAT Gateway to subnet where VMs connect.
  4. Add Public Load Balancer and configure outbound rule from VMs.
  5. Add Azure Firewall, configure UDR (User defined route) to route 0.0.0.0/0 to Azure Firewall, and set UDR to the subnet where VMs connect.
  6. Use Service Endpoint and Private Endpoint to open routes to required services.

1. Replace Standard Load Balancer with Basic Load Balancer.

Basic Load Balancer permits VMs behind load balancer to connect outside VNet, while Standard Load Balancer doesn’t.

When forced tunneling is enabled, replication traffic leaves the Azure boundary (i.e. is gone to the Internet). As the following document says, this configuration is not recommended. It is okay if forced tunneling is disabled.

2. Assign public IPs to VMs behind Standard Load Balancer.

Public IPs are assigned to both VMs to access directly outside VNet.

This solution means not only outbound traffic from VMs goes but also inbound traffic to VMs from outside VNet comes. So, the following configuration is mandatory.

  • NSG (Network Security Group) should be configured to manage inbound/outbound traffic.
  • It is simpler to assign NSG to subnet where VMs connect than to assign NSG to each NIC of VM.

If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.

3. Assign NAT Gateway to subnet where VMs connect.

Instead of assigning public IP addresses to VMs, NAT gateway is assigned to the subnet where VMs connect.

NAT gateway works for outbound access and inbound traffic cannot use public IP address(es) assigned to NAT gateway. So, NAT gateway prevents VMs to being accessed from outside VNet.

If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.

4. Add Public Load Balancer and configure outbound rule from VMs.

Public Load Balancer and outbound rule allow us to configure to permit outbound traffic from VMs behind the load balancer.

This solution is similar to the 2nd and 3rd solution, but this is the most expensive than the 2nd and the 3rd. If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.

5. Add Azure Firewall, configure UDR (User defined route) to route 0.0.0.0/0 to Azure Firewall, and set UDR to the subnet where VMs connect.

Azure Firewall allows us to managed inbound/outbound traffic from/to VMs. And default route of the subnet where VMs connect is changed to Azure Firewall with UDR (User Defined Route).

Azure Firewall allows us to manage inbound/outbound traffic with not only IP address(es) and FQDN but also FQDN, while NSG does not with FQDN. If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.

Indeed Azure Firewall is powerful, but this option is the most expensive of all mentioned in this entry.

6. Use Service Endpoint and Private Endpoint to open routes to required services.

Instead of assigning public IP address(es) to either VMs or the subnet, routes to services required for ASR replication are opened with Service Endpoint and Private Endpoint.

The following document describes how to enable replication with private endpoints.

Services required for ASR replication and what option(s) are acceptable are listed below.

  • Azure Active Directory: Service Endpoint or NAT Gateway (as of now, NAT Gateway is the best solution to permit access to Azure Active Directory).
  • Service Bus : Service Endpoint only (As destination is not clear, Service Endpoint is the only option.)
  • Storage Service: Either Service Endpoint or Private Endpoint
  • Recovery Service Container: Private Endpoint Only

This solution is ideal thanks to the following reasons.

  • All traffic does not leave Azure boundary and is kept secure.
  • Public IP addresses are required when NAT Gateway is used.
  • Cost effective.

Note the following points when configuring this solution.

  • Depending upon storage account SKU (premium or standard) used for cache storage, storage account roles to be granted to managed identity of Recovery Service Container varies. In specifically,
    - Standard SKU: Contributor and Storage BLOB Data Contributor
    -
    Premium SKU: Contributor and Storage BLOB Data Owner
  • In the URL above, configuring private endpoint to cache storage is optional. In this case, however, we have to configure Private Endpoint or Service Endpoint to cache storage as VMs are behind Standard Load Balancer.

Conclusion

We have several options to solve this situation and each option has pros/cons. After explaining these options to the customer, they made a decision to choose option #6.

--

--

Akihiro Nishikawa
Microsoft Azure

Cloud Solution Architect @ Microsoft, and JJUG (Japan Java Users Group) board member. ♥Java (JVM/GraalVM) and open-source technologies. All views are my own.