The Network Overlap Problem

Clashing CIDR Ranges In Peering Groups

Published in

DevOops Discourse

6 min readMar 11, 2024

The Problem

At a previous client, I was on a team tasked with moving all on premises assets (including network) to a popular cloud provider. The specific client team was part of a wholly owned subsidiary of the actual contracted client and the project involved moving the contents of their data center into the parent company cloud. The network ranges were extremely segmented and there was little obvious rhyme or reason for address packing or range banding. During the project, we found that we had to change some ranges to consolidate and we had to be careful to not overlap with the parent company’s ranges at common interfaces. Trying to change as few (in many cases, hardcoded) IPs as possible.

A problem that plagued us for some time was something like the following scenario:

We would identify a new range we needed that was atypically large or small as compared to the network ranges they maintained in their data center (and thus not simply a contiguous selection from a well-known subnet size, raising the risk of the new range falling within or including an existing range),
A member of the client team would come up with a new range that, unbeknownst to them, collided with some other range,
We unwittingly encoded this new range expecting it to be unallocated across existing ranges,
We ran the plans (we were using terraform) and nothing was reported to be wrong,
Then we ran the apply jobs, and discovered, much to our chagrin, the overlaps in network ranges within a peering group.

Without it being a feature of some tool we were using, how could we ensure there will be no overlapping ranges across peering groups (which were already pushing the default limits in terms of cardinality of included subnets)? In the absence of a simple solution handed to us, and due to a sequence of events and circumstances that could only be referred to as a ‘comedy of errors’ (albeit in no way humorous), it was a protracted and frustrating experience for all involved.

More generally, there are many ways an organization can find themselves tangled in so many network ranges that it becomes tedious to track and maintain. Decentralized teams managing product-specific network components, or disparate networks becoming joined together through mergers and acquisitions all are subject to this risk. How do we easily determine which ranges overlap?

While the story at the opening is a rather complicated example, and perhaps seems almost contrived, given the upswell of acquisition activity over the last couple of decades — and not always including a “world-class, ‘leading-edge’ digital infrastructure”, a simple proof of which might be established combining basic statistics (e.g. normal distributions) with basic game theory (e.g. the concept of “zero-sum”) — it is surely not an isolated incident.

Common Scenarios

In any acquisition or merger there is a danger of overlap in internal/private subnetworks. Sometimes, with acquisitions, the proud parent company may decide to swaddle their newbought with promises of status quo: phrases like “no changes”, “operations as usual”, “same game, just a different coach” are thrown around with big, sincere grins. At some point, however, nearly all of them find themselves dealing with a fussy child when they break that promise and implement the dreaded change management (aka organizational shakeup), which nearly always includes changes to someone’s technology standards — naming conventions, development tools, and certain infrastructure — to ensure everyone fits into the new home.

The likelihood of a more massive (in terms of people, processes, policies, and technologies/licenses) parent company being the one to make widespread changes approaches zero, regardless of who’s strategy is “right”. In terms of infrastructure, the networks are among the most likely to be affected. What’s the likelihood that an environment at two different companies uses something utilizing the 10.0.0.0/24 range? I’d say likely enough not to want to bet against it.

In the case of a merger, especially among comparable sized companies, the impact of collisions may be even more acute. Large companies sometimes use the RFC 1918 ranges in strategic ways (e.g. to segment networks intended for internal vs. external applications), so there’s more chances of overlap (or even mismatching schemas). Certainly, there are some matches where one side brings the superior solution and infrastructure in some capability space, and the question of who aligns to whom is at least clear (sore feelings notwithstanding). In many other cases, the determination is not so straightforward, and both sides claim to strategic superiority may have some merit.

Finally, it is not uncommon for a centralized team to manage the network boundary and allow product teams to manage their own internal network structure. This is common with very large organizations or highly segmented services and product lines. The idea being an architecture in which the boundary networks and their interfaces are managed by a single team and are designed to interface with product-specific environment networks managed by the product teams themselves. Even in hub and spoke scenarios, care must be taken either in hub architecture or spoke configuration.

Image generated with Midjourney (and edited)

On Premises vs. The Cloud

On premises, there is typically a single team that manages issuing the IPs and ranges. Even in the case of a self-service solution for requesting network ranges from blocks of larger pre-allocated ranges, someone at the organization had to pre-allocate those larger blocks. Once a range is carved out of the block, the expectation is that the system issuing the range would know what ranges are available and only allow them.

In the cloud, it’s the provider and — under the shared responsibility model, in this case the responsibility is to choose non overlapping ranges — consumer’s joint responsibility to maintain subnetworks. The cloud provider doesn’t have a way of knowing the expectations behind a particular subnet (Is this for an isolated sandbox? Part of a hub/spoke configuration? Can we leverage NATing to reuse ranges?), and for its own business to work, it must support any range duplicated by any customer. It is therefore up to the cloud service consumer to manage the non-overlapping (or otherwise) for their networks.

Moreover, however #theFutureIsNow we seem to be there’s still an awful lot of weight carried by spreadsheets when it comes to planning, tracking, and reporting network ranges. The more subnets, the more chance of collision, and more effort/frustration dedicated to determining where collisions may occur. This can be a time-consuming exercise if there is a battery of bureaucratic processes between determining a new network IP range and actually realizing an errant overlap.

The “Solution”

“Ok ok, Robbie I get it, this is a super-dee-duper important problem. So how do we solve it?” Well, just like any ‘senior engineer’ I skipped my way over to my preferred search engine and asked the internet, of course. I regularly use CIDR-to-IP conversion tools, so I was expecting there to be something readily available that fixed all of my problems. Instead, the most meaningful results were Stack Overflow answers with snippets of Python code and meandering Reddit threads that occasionally spiraled into a coherent resolution.

I cut my professional teeth on JavaScript so I’m more apt to use such code as a guide than to copy and paste directly, which is what I did: I created a module in TypeScript (don’t judge me if that’s uncool these days) and tested it in a node shell. Now I had a way to quickly print out colliding network ranges that were written in CIDR notation! But what about the ‘next guy’? What about the countless others, drowning in a sea of /26s and /18s, for god knows why? What about altruism?

Check Out CIDR Clash Website

In the case you don’t inhabit shells and text editors, but still want to benefit from a solution like this (or, I don’t know maybe you’re not one of those sickos who codes for pleasure despite a 40+ hour a week software job), head on over to cidrclash.com to check out a website hosted solution. The initial version takes as input a comma delimited list of IPv4 IP ranges in CIDR notation and prints the collisions. It can obviously use some additional features and validation/flexibility, but it’s functional and (hopefully) relatively straightforward to use. I’ll follow up with a separate post that goes into more detail on the experience of building the website and the site itself.

In the meantime, good luck out there. Be wary out there in the digital wilderness!