Don't forget DragonflyBSD version of PF. They look an old version of OpenBSD PF ...

darkhelmet · on Sept 27, 2023

DragonFly and FreeBSD have radically different ideas on how multithreading should work. (Understatement of the century right there!)

FWIW the threading in the network stack is one of the original major divergences between Open/FreeBSD PF. The way FreeBSD's PF works is within the context of FreeBSD's threaded network stack. FreeBSD PF is not "single threaded" in the way it used to mean.

I would really like to have the modern PF syntax/structure from OpenBSD in FreeBSD though. The unified rdr/nat/pass/block mechanism in OpenBSD is so much cleaner and nicer to work with.

cperciva · on Sept 27, 2023

DragonFly and FreeBSD have radically different ideas on how multithreading should work. (Understatement of the century right there!)

To elaborate for people who don't know the history: Disagreement about how multithreading should work are why Matt got booted from the FreeBSD project and started DragonFly.

darkhelmet · on Sept 27, 2023

It's not that simple. Disagreements are one thing, but the real reason was the conflict and drama that happened when things didn't go his way. The threading approach just happened to be a really good trigger for those conflicts.

Conflicting personalities makes resolving technical disagreements so much harder.

cperciva · on Sept 27, 2023

I was being euphemistic. It was a "doesn't play well with others" issue, but it came to the fore with SMP.

bch · on Sept 28, 2023

Playing well obviously important (and I wasn’t involved, have no other insight), but do you know if the poor showing for fbsd 5.x correspond w the disagreements?

cperciva · on Sept 28, 2023

FreeBSD 5.x was a very stressful time -- bringing full SMP to the entire kernel was a massive technical challenge, and it didn't help that a lot of the people who were expecting to work on it lost their jobs during the dot com crash.

5.x was a poor vintage because making the kernel SMP was hard, and social issues became problematic because SMP was hard, but Matt's departure was neither the result or cause of FreeBSD 5.x having issues.

tiffanyh · on Sept 28, 2023

In hindsight, with it being now 20 years later, which approach to SMP do you think was the better technical choice?

The FreeBSD model or DragonflyBSD approach?

cperciva · on Sept 28, 2023

Assuming "the DragonflyBSD approach" is the same as when it forked 20 years ago: FreeBSD's approach was definitely the right one, for two reasons.

First, adding locking is fundamentally easier to get right than rewriting everything into a message-passing model. It took ~5 years to get SMP right in FreeBSD (from when 4.0 was released to when 6.0 was released) but it would have taken double that to get things equally stable and performant with the more intrusive redesign Matt wanted to go with.

Second, pretty much everyone else went with the same locking approach in the end -- which has resulted in CPUs being optimized for that. Lock latencies dropped dramatically with microarchitectural changes between ~2005 and ~2015.

As a theoretical concept, I love the idea behind Dragonfly, but it simply wasn't practical in the real world.

dmm · on Sept 27, 2023

FreeBSD pf has been multithreaded for a long time. The FreeBSD 11 pf(4) man page from 2013 specifically says so[1].

[1] https://man.freebsd.org/cgi/man.cgi?query=pf&apropos=0&sekti...

Lammy · on Sept 27, 2023

"SMP" might be a more useful search term. You're correct on the timeframe — merged into HEAD in 2012 https://lists.freebsd.org/pipermail/freebsd-pf/2012-Septembe...

I'm not an OpenBSD user but it sounds like this changed recently in OpenBSD-land as well, because the "Will multiple processors help?" FAQ entry disappeared some time between January and March 2023: https://web.archive.org/web/20230112170731/https://www.openb...

gigatexal · on Sept 27, 2023

I didn’t know this but it makes sense. Either way for most of the use cases (homelabs on residential internet speeds) either FBSD or OpenBSD’s PF will perform just swimmingly.

postmodest · on Sept 27, 2023

The only advantage I can see to MT PF would be multiple NICs. And even then they have to be really beefy NICs

smashed · on Sept 27, 2023

From what I understand, modern NICs have multiple Rx/tx queues and thus the work of sending and receiving can be assigned to different kernel threads.

The number of queues will vary according to the specific NIC chip.

toast0 · on Sept 28, 2023

I was going to say something similar.

Multiqueue NICs do great work on networking loads when you have 1 core per NIC queue; and you can eliminate or at least reduce cross-core communication for most of the work.

It's not complex firewalling, but I did some HAProxy stuff in tcp mode, and the throughput available when running without queue alignment was miniscule compared to the throughput available when properly aligned. Firewalling has the benefit that queue alignment should happen automagically, because of where it runs. If you're doing a lot of processing in userspace, it makes a lot less of a difference, but on a very lightweight application, there was no point in using more cores than nic queues, because cross-core communication was too slow.