What I was saying with that "combining" is that your router is now essentially t...

phil21 · on May 28, 2014

Ok, I understand what you meant now. I do disagree - it's simply doing what routers do, and has no specific knowledge or configuration for the VIP. It's simply forwarding traffic based on a destination table just like any other packet. If this was problematic in any way, your average backbone would implode - ECMP is utilized extensively to balance busy peers. Also routers already do redundancy (at least via L3) extremely robustly - so it's basically a "free" way to load balance your load balancers. You simply are not going to get the same level of performance out of a LVS/DR solution, as it's competing with very mature implementations done in silicon. We'll have to agree to disagree here.

Of course in ECMP all paths are the same - I don't see this as a downside though. Most router vendors do support ECMP weights if really needed, but there are better ways to architect things. I've run this setup with over 1500gbps of Internet-facing traffic, and never ran into a full 10g line because it was engineered properly. An in-house app that lowers my hashing inputs would probably require a different setup though, I agree.

16 ECMP is a decent number, but these days most routers I work with support 32. Some are supporting 64 now. But that's almost irrelevant, unless you're stuffing all your load balancers on a single switch. It's per-device, so you have 8 load balancers connected (and peering via BGP) to one switch, 8 another, and so on. Those then forward those routes up to the router(s) which then ECMP from there (up to 16/32 downstream switches per VIP). I've never needed more than "two levels" of this so I haven't really played with a sane configuration for more than 1024 load balancers for a single VIP (or 512 in your 16-way case). It scales more than perhaps a dozen companies in the world would need it to. Note that this explanation may sound complicated, but in a well engineered (aka not a giant L2 broadcast domain that spans the entire DC) network it just happens without you even specifically configuring for it.

Since my knowledge is dated - how do you "stop accepting new connections" with the LVS/DR model? I'm sure you can, just can't mentally model it at the moment. You need to have the VIP bound to the host in question for the current connections to complete, how do you re-route new connections to a different physical piece of gear at the same time utilizing the same VIP?

There are certainly downsides to this model as well, I don't want to pretend it's the ultimate solution. But, it's generally leaps and bounds better than any vendor trying to sell you a few million dollars of gear to do the same job. The biggest downside to ECMP based load balancing is the hash redistribution after a load balancer enters/leaves the pool. I know some router vendors support persistent hashing, but my use case didn't make this a huge problem. There are of course ways to mitigate this as well, but they get complicated.

In the end, for the scale you can achieve with this the simplicity is absolutely wonderful. It's one of those implementations you look at when you're done and say "this is beautiful" since there are no horrible-to-troubleshoot things that do ARP spoofing and other fuckery on the network to make it work. ECMP+BGP is what you get, you can traceroute, look at route tables, etc. and that displays reality with no room for confusion. No STP debugging to be found anywhere :)