Line rate 10Gbit/s packet processing on FreeBSD with netmap

muppetman · on June 15, 2011

This is really very interesting.

I work with Juniper routers and switches all day and every day, the thing I love about them is they're built on top of FreeBSD (now called JunOS, but it still harks back to FreeBSD when you drop to the shell)

Something small and light and fast, running on FreeBSD with this added would be perfect for customer CPE equipment or for non-essential routing tasks.

Very impressive, I look forward to seeing where this goes.

tptacek · on June 15, 2011

The data path on a Juniper isn't passing mbufs around the FreeBSD IP stack, is it? It's FreeBSD command-and-control, but my understanding is that the packet processing itself might as well be its own rtos.

tryp · on June 15, 2011

I don't know about juniper specifically but most multiport routers are built on packet processing asics with content addressable storage in local ram and built-in macs for each port. When a packet is encountered that requires more complex interaction than the asic can handle it's kicked up to a CPU for processing.

Nrsolis · on June 15, 2011

This.

Juniper has always used a (heavily hacked) FreeBSD core running on an embedded PC platform (Intel-based Routing Engine) for JUNOS that did all (or most) of the protocol related stuff and left the actual forwarding of packets to a specialized core of ASICs (The Packet Forwarding Engine).

This has changed in recent years with some lower-level protocols being handled on the line-cards themselves on newer platforms. The vast majority of the protocols still run on the Routing Engine though.

tptacek · on June 15, 2011

Yeah, sorry, I was talking about the control plane stuff. Does Juniper use any of the original FreeBSD IP stack for its control plane or exception path stuff?

Nrsolis · on June 15, 2011

I'm not really sure.

TBH, it's not really the limiting factor. The interconnect between the RE and PFE is something like 100mbps.

jr299 · on June 15, 2011

If one were interested in building something like this themselves or getting into the market, where would be a good place to start as far as acquiring hardware, learning about asic systems, and the software behind these systems?

I've been building my own managed firewall and router systems for awhile on commodity hardware, freebsd, and nice nics and would like to look into taking it to the next level and perhaps starting a business in this area.

smutticus · on June 15, 2011

The networking world is quickly moving away from custom silicon. Because of companies like Broadcom and Marvel it's getting easier and easier to buy/order ASICs instead of designing them. Cisco announced this year I think that all their lower-end switches will be using Broadcom.

So go check out what these 2 companies have in terms of switch on a chip type ASICs. AFAIK no one has come out with a multiport PCIx card that can do L3 switching between its ports. That would be neat. And I think it's just a matter of time till someone does. I would buy such a card if it worked under Linux or FreeBSD just to play with.

nschrenk · on June 16, 2011

NetFPGA might fit what you're looking for: http://netfpga.org/

sc68cal · on June 15, 2011

As many have said on -CURRENT: Thanks for the great work. Do you plan on getting this merged into the FreeBSD tree, and maintaining it?

lrizzo · on June 15, 2011

yes to both questions. merging should start very soon.

kwis · on June 15, 2011

Thanks again for the great work.

smutticus · on June 15, 2011

This is impressive. How is the performance on different packet sizes? Most real world loads would require far fewer than 14.88Mpps since most packets would be equal to MTU. So I'm wondering if performance would increase or decrease with packet size.

lrizzo · on June 15, 2011

please look at the paper linked to the url, which defines the problem and the metrics in more detail.

smutticus · on June 15, 2011

Thanks. This is what I was looking for.

From the linked paper: The most challenging situation is usually with the shortest packet sizes, where per-packet costs are almost unavoidably dominant over the memory copy costs. This justifies why we will run most of our tests with 64 byte packets (60 + 4 for the Ethernet CRC). In netmap, in particular, the per-byte system CPU cost is exactly zero because all data transfers are performed by the NIC, and it is up to applications to access memory, if it really needs to.

codepoet · on June 15, 2011

Great! Maybe you could compare this with http://www.ntop.org/PF_RING.html for Linux?

lrizzo · on June 15, 2011

The "old" pf_ring is only meant for packet capture and involves packet copies, so it is several times slower than netmap. There is a newer "Direct Network Access" (DNA) version of pf_ring which avoids copies and has the same performance of netmap, but is much more fragile because in DNA the userspace program writes directly into the NIC registers and rings (so it can crash the entire OS), whereas in netmap the NIC programming is filtered by system calls.

mateuszb · on June 15, 2011

Very clever solution on stack bridging. +1