Multiqueue NICs do great work on networking loads when you have 1 core per NIC queue; and you can eliminate or at least reduce cross-core communication for most of the work.
It's not complex firewalling, but I did some HAProxy stuff in tcp mode, and the throughput available when running without queue alignment was miniscule compared to the throughput available when properly aligned. Firewalling has the benefit that queue alignment should happen automagically, because of where it runs. If you're doing a lot of processing in userspace, it makes a lot less of a difference, but on a very lightweight application, there was no point in using more cores than nic queues, because cross-core communication was too slow.
Multiqueue NICs do great work on networking loads when you have 1 core per NIC queue; and you can eliminate or at least reduce cross-core communication for most of the work.
It's not complex firewalling, but I did some HAProxy stuff in tcp mode, and the throughput available when running without queue alignment was miniscule compared to the throughput available when properly aligned. Firewalling has the benefit that queue alignment should happen automagically, because of where it runs. If you're doing a lot of processing in userspace, it makes a lot less of a difference, but on a very lightweight application, there was no point in using more cores than nic queues, because cross-core communication was too slow.