Rudder: An etcd backed overlay network for containers

wmf · on Aug 28, 2014

I'm glad to see this since an easy overlay for Docker is badly needed. But ugh, userspace encapsulation. This would be a lot better if it used OVS + VXLAN.

philips · on Aug 28, 2014

The plan is to add more backends; we started with userspace encapsulation because it works everywhere and is easy to setup and control.

Initially we wanted to use an existing in-Kernel encapsulation format like a simple ip-ip encapsulation. However, IP-IP doesn't work on AWS. Then we looked at VXLAN but it relies on multicast which doesn't work on most cloud networks either. Most recently we started looking at the VXLAN DOVE extensions and are getting a prototype together for this.

tl;dr the initial goal is to show that something generic is needed and can work, we will get something that is performant and/or has encryption next.

jpgvm · on Aug 30, 2014

The kernel VXLAN implementation actually supports manual endpoint configuration via NETLINK (or newer versions of the iproute2 package).

derefr · on Aug 28, 2014

Would this allow you to mesh together containers in separate datacenters? Or mesh together, say, the containers on your home PC with containers in the cloud? I'm guessing not.

What I'm really excited for are the possibilities of docker containers with public-routable IPv6 addresses. It would move the world away from "one host: many services on different arbitrary ports", and back to the "one host: one service, possibly speaking a few protocols with ports being used for OSI-layer-5/6 protocol discovery" model of the 1970s (and eliminate the madness of SRV records, besides.)

Imagine if, say, bitcoind (which normally speaks "JSON-RPC" to clients -- a specific layer-6 encoding over HTTP) sat on "bitcoind.host:80" instead of "host:8332". Suddenly, it'd be immediately clear to protocol clients (e.g. web browsers) which hosts they could or couldn't speak to, based on the port alone! The whole redundancy between schema and port in URLs could go away: they'd be synonymous. And so on.

shykes · on Aug 28, 2014

I totally agree that containers in general, and Docker in particular, could play a big role in moving the status quo towards IPV6 and a more sane approach to service-oriented networking. I would love to turn on IPV6 by default on every Docker runtime everywhere - the question is, how do we deal with 1) existing host systems, 2) existing networks and 3) existing applications which may not be IPv6-ready? We are already upgrading the guts of Docker for more powerful networking and clustering in general, so if you give me a solid answer we can get this out the door pretty quickly :)

eyakubovich · on Aug 29, 2014

If your hosts in different data centers share IP space (you're able to route between them) then you can use Rudder. Please note that the traffic is not currently encrypted so if the interconnect between the data centers is over open pipes, it would be vulnerable. We'll be looking into encrypting the traffic in the future.

If there's NAT between the hosts (typically this is the situation at home) then Rudder will not work. We may add limited support for NAT, e.g. ability to specify a public IP to use instead of the one assigned to the NIC. However to make it easy to run from home would require doing NAT punching like STUN. We're focusing on making this useful for running in the datacenters with Kubernetes clusters.

Oculus · on Aug 29, 2014

Only recently did I realize what a power house the team at CoreOS is. They're building some really cool shit. I can spend hours on their blog just right-clicking and searching on Google. Definitely a good way to learn tons about distributed computing and that whole subject area.

MartinMond · on Aug 28, 2014

This is interesting, it's pretty similar to http://tinc-vpn.org which is a mesh VPN.

eyakubovich · on Aug 29, 2014

Correct, tinc is another example of a mesh overlay network. However tinc requires configuration files to be created on each host and then distributed to others. If machines are part of an etcd cluster, you can use Rudder to create a mesh without the need to create and distribute configuration files.

contingencies · on Aug 28, 2014

Sorry, what problem does this solve?

Things are not as easy on other cloud providers where a host cannot get an entire subnet to itself. Rudder aims to solve this problem by creating an overlay mesh network that provisions a subnet to each server. ... is unclear.

What host for virtualized infrastructure needs an entire, fake, non-internet-routable subnet that it cannot provision itself?

I believe there's a broken one size fits all network architectural assumption or provisioning methodology at the root of all this.

(Edit as reply to child as rate-limited: Sounds like I was right, and it's docker's fault. How is this not better solved with the standard approach of applying network namespaces and/or unique interfaces to containers?)

wmf · on Aug 28, 2014

It solves port conflicts caused by running multiple copies of the same service on the same host. Kubernetes likes to have a few sidecar containers hanging off each service instance (e.g. memcached might have an sshd sidecar that wants to be on port 22 and nginx might want to have its own sshd sidecar also on port 22), and if your host only has one IP address then Docker has to do dynamic port mapping and your service discovery system has to track port numbers and such.

jbeda · on Aug 29, 2014

sshd is kind of a poor example.

Kubernetes has an idea of a pod -- a group of containers that share an netns and have an IP.

Reasons you might want a pod: * A thick client or client side proxy that follows the ambassador pattern for service discovery and access. * A data-loader and data-server pair. The loader would grab data from some persistent source and write it to disk or a shared memory segment. The data-server would then use that data and serve it up. You'd could run the data-loader at a lower QoS so it doesn't stall the data-server. * Some sort of server and a log saver. The log saver could periodically batch up and compress structured log data and upload it to a persistent store (such as BigQuery in GCP). You want to build/configure/restart/upgrade the log saver separately from the server. You'd also run the log saver at a lower QoS.

Inside of Google we have all sorts of examples where we have sets of containers/tasks/processes that are co-scheduled onto a machine and work together.

vquemener · on Aug 28, 2014

FYI there's already an open source software going by the name of Rudder : http://en.wikipedia.org/wiki/Rudder_(software)

smcleod · on Aug 29, 2014

"... it has almost no affect on the bandwidth." - looking at those numbers it's not the case at all, those numbers are really low to start with (as AWS isn't exactly the fastest) but obviously this would be much more noticeable at the higher end of the scale when we're talking about 100-200MB/s transfer rates, not to mention nearly doubling the latency!

kapilvt · on Sept 1, 2014

also works great with lxc, i pushed a juju charm which automates the config for lxc http://bazaar.launchpad.net/~hazmat/charms/trusty/rudder/tru...