The EVPN technology in the datacenter to manage overlay networks has become one of the most popular and widely adopted technologies in the past few years. One recurring topic of this technology, which I have been encountering, is how to connect multiple data centers together in a secure, resilient and scalable manner which is commonly known as Data Center Interconnect (DCI).
I’m currently working on a project with a customer which involves a redesign of their data centers which are used to provide compute and storage services to customers. This involves being able to securely segment multiple customers within a single data center as well as being able to stretch workloads across data centers for high availability failover and scaling.
The design will use a folded 3-stage Clos which is also known as a spine and leaf topology. The underlay will use eBGP and the overlay networks will be using a VXLAN data plane with an EVPN control plane. Using BGP for the underlay and overlay simplifies the configuration and reduces the number of routing protocols that need to be troubleshooted.
There are three main DCI design goals for this project which are
- Limit the blast radius of any failures to a single data center.
- Control which overlay networks are advertised to other data centers.
- Support stretching of networks across different network vendors. (Single vendor per DC).
There are a few DCI approaches to connect EVPN data centers together, and in my view they can be summarised into one of two design goals.
- Single Control Plane
- Distributed Control Plane
Single Control Plane
Also known as Over The Top (OTT) or single domain.
This is the most straightforward design supported by all network vendors which implement EVPN. It is essentially just stretching the same fabric to your remote data centers by peering the EVPN address family between border leaf nodes. This requires the underlay routes to be advertised to all other data centers so the leaf nodes can reach VTEP loopbacks.
This option is likely to be fine for most environments and allows you to easily stretch overlay networks between data centers as the EVPN routes are advertised to all leaf nodes across your data centers. This is the simplest, most documented and well supported approach to EVPN DCI.
However as all nodes in the EVPN domain will have updates for hosts and networks from all other nodes, regardless of if they are participating in these networks, this option may not be suitable for very large scale environments as the number of generated EVPN routes will increase and eventually consume leaf nodes’ routing tables.
Distributed Control Plane
Also known as EVPN stitching or multi domain.
This design involves breaking up the EVPN control planes and explicitly defining which overlay networks should be interconnected between domains.
Using this approach we can reduce the number of EVPN routes that are advertised outside of a data center which reduces control plane flooding updates and allows us to scale further. By breaking up the EVPN control plane into many smaller domains we are also limiting the blast radius of any possible control plane failure. In this design the border leaf nodes act as gateways for all overlay networks in a data center and remove the need for all VTEPs to be globally reachable.
This option, although appealing on paper, has been regarded as a more complicated option which is still not supported by all network vendors. This option is more commonly documented between data plane technologies such as VXLAN within the DC to MPLS in the DCI rather than VXLAN to VXLAN which has traditionally not been possible due to split horizon. This feature is usually referred to as VXLAN to VXLAN stitching and one vendor Arista, that I know of at the time of writing, supports this feature (Multi-Domain EVPN using VTEP to VTEP bridging).
Using MPLS in the DCI may not be something that organisations already have or are willing to introduce due to cost and expertise, so environments which have IP only connectivity between sites have been forced to stick with a single EVPN control plane design. Environments which have VPLS or other Ethernet WAN services do have an option to trunk decapsulated traffic across the DCI but this would add additional configuration and complexity as any inter VLAN traffic would require that both source and destination VLAN’s be present at both data centers or VRF’s peered over a separate VLAN.
Now that we have discussed the two most common DCI design options we can refer back to the design goals for this customer project to decide the most appropriate design. Based on the 3 design goals the DCI design which best matches these requirements is a distributed control plane.
A distributed control plane will increase the number of failure domains which will meet the goal to limit the blast radius. It will also allow granular control of which overlay networks are advertised out of each DC so that only desired networks are stretched and reduce unnecessary routing updates. Finally using a distributed control plane will facilitate integration with mixed network vendors by allowing each vendor environment to operate as its own EVPN domain and see all announcements from remote data centers summarised at the border nodes without the need to be aware of vendors used at remote data centers.
I hope this has been useful to understand the different DCI deployment designs and which one might be right for your environment.
If you are interested in the configuration of an EVPN distributed control plane deployment which I have deployed in a lab using Arista have a read of this article https://medium.com/@adamkirchberger/evpn-distributed-control-plane-using-arista-veos-44a4d211a881
Please feel free to reach out if you have any questions.