Direct Server Return In The Real World

Mahesh Paolini-Subramanya
4 min readMay 14, 2018

--

________________________________________________________________

This is part of a series on Direct Server Return
1.
A Quick Primer
2.
SYN Floods
3.
The Real World
________________________________________________________________

DSR can be remarkably useful in asymmetric traffic environments, where the traffic to you is a lot less than the traffic from you (think Netflix, YouTube, and, well, pretty much any streaming service you can think of.

Packet Rewrites
In a previous article, I used the above diagram to show how, with DSR, the response traffic went directly back to the Client. Mind you, there is a little bit of chicanery going on here — even though the traffic is originating from t the packet now looks like it is addressed to Server B IP:Server B Port, the actual network packets show that they are originating from Router IP:Router Port. Where did that happen?

The answer is, well, it depends. The thing is, the diagram above is a ridiculous oversimplification of what an actual network will probably look like. In the real world, the network probably contains all sorts of proxies, reverse-proxies, load-balancers, firewalls, and whatnot. The actual location of the rewriting will depend on a combination of the actual setup, security policies, network administration, and more. That said, since we are in OversimplificationWorld™, the rewriting probably happens in one of the two places shown below

Egress Gateways
See the new Egress Gateway shown in the picture? The reality is that all the outbound traffic from the network is probably going through something like that — basically, some type of router(s), or gateway(s), or whatever that all the Servers point at (it’s just easier to manager traffic, firewalls, and networks this way).

Which leads to the two most common places to do the rewriting, viz., the Egress Gateway, and the Servers. The nice thing about having all the outbound traffic go through the Egress gateways is that it gives a single point of control and/or co-ordination for all network management activities, as contrasted to doing it on your (very large number of) Servers.
The thing to keep in mind is that the actual load of doing the packet re-writing is not that significant. In most cases this is a stateless activity — all that has to happen is that the Router’s info needs to be swapped in for the Server’s info.

State Management
Then again, “most cases” above is doing a little bit of heavy lifting. The reality is that the network probably looks more like this

Inbound traffic hits a Border Router, which forwards it to one of the Multiplexers (MUX), which then picks a particular Server to send the traffic to.
The fun part is that the Border Router tends to remember which MUX it sent traffic for a given Client to, and the MUX remembers which Server it sent that traffic to, and so forth. This allows for all the traffic from a given Client to always ends up on the same Server. (Note: This isn’t necessary, it’s something that is useful for any application which needs to maintain State, or Cache, or some such on a specific Server)

With this kind of setup though, rewriting gets a wee bit complicated. Consider what happens when externally visible IP addresses are tied to the MUX. In that case, Server D would need to know that it’s traffic came from MUX 2 to rewrite outgoing packets correctly, while Server A needs to know that it’s traffic came from MUX 1 to do the same thing.

Network Changes
The question then is, “What happens when the network changes??”. This can include stuff like adding (or removing!) a MUX or Server.
Why?
Well, it might be a Scale-Out event (Customer-base has doubled…MOAR SERVERS!), or failures (S*****T! ONE OF THE MUX JUST DIED!), or something less fun (or gruesome). In either case, the reshuffling shouldn’t affect existing traffic flows,.

And that gets us back to “most cases” from earlier. When the network topology changes, rewrite rules might change (“Server D is now associated with MUX 3!”.
For that to happen, everybody needs to be aware of the new network configuration.
And for that to happen, you need some form of distributed consensus model (like Paxos, or Chubby, or whatever).
Then again, even if you know what the new configuration is, you can’t just switch over immediately, because, well, what do you do with the existing traffic flows to things that have moved? (•)

The bottom line is that there is a lot happening under the hood here — it’s still an evolving field, with some fairly fun stuff happening. I hope all this gave you just a little glimpse into whats going on behind the scenes. For more, just start Googling 😆

(•) “Stateless Datacenter Load-balancing with Beamer” by Olteanu et al., for just one example of how to deal with this.

--

--