Oh hey, I worked on this a while back. At the end of 2021 I also did work for Teleport extending this same approach and applying it to Role-Based Access Control rules. This was interesting because it exercised Z3's string analysis capabilities, even checking regular expressions for equivalence: https://ahelwer.ca/post/2022-01-19-z3-rbac/
People get intimidated by the terms SMT or theorem prover but I want to emphasize this is actually a really simple process. Any time you have two sets of logical rules (often found in access control systems) you want to compare, translating them into Z3 is really quite easy. There are Z3 bindings in lots of languages, I used Python for the Teleport work: https://github.com/gravitational/rbac-linter
My friends and I have also used Z3 to solve various math puzzles. It's useful for checking whether a solution exists when you start modifying the puzzle to see how far you can push it. We used it for this puzzle, for example: https://www.nsa.gov/Press-Room/News-Highlights/Article/Artic...
Stupid question: isn’t this actually a “simple” problem for which something like Z3 is in some sense overkill?
Firewall rules are just a binary colouring of the 4D volume defined by the four-touples of all possible flows. (Two addresses, two ports)
Because of the way rules are typically defined this is a very sparse set with huge consistent rectilinear areas.
A brute-force solution might be to simply build the 4D equivalent of a quad tree, optionally with splits at the optimal boundaries instead of at “halves”. Then the equivalence check is just a set “xor” operation. That will also reveal examples that are treated inconsistently.
Such a tree could represent any reasonable set of rules in surprisingly small amounts of memory. Also, it’s a good way to find “equivalent but more efficient” rules by reversing the process. Adjacent areas that make larger contiguous rectilinear areas can be merged fairly easily in the tree and then written out as the equivalent rule.
I’m not entirely pulling this out of a hat, I did something vaguely similar (for a related problem) when analysing complex AWS security groups a few years ago…
Similarly I wrote a tool to take a set of arbitrary subnets, union them, and then compute the inverse to “exclude the whole internet except some countries and networks.” I did this with a four-gigabit array in the naive way and it worked great!
The post mentions this, and says that the Z3 implementation is significantly smaller and easier to understand than a bespoke implementation like you are proposing.
Yes, I did evaluate an algorithmic approach similar to the grandparent's approach, although was thinking about it in terms of ruleset normalization (which I think is equivalent to the the algorithm described). I even put a good amount of effort into specifying the algorithm in TLA+! However, the SMT approach really was significantly easier. It also had wonderful second-order effects because learning how to effectively use SMT solvers has been very valuable for my career.
IPv6 would work the same way, because it is a finite space and the number of distinct ranges is low, because firewall rule tables are generally quite small (megabytes!). The ranges are not some "noisy" set, but large blocks.
Thinking about it, any tree-like data structure that includes a compression designed for handling long run-lengths would work just fine.
For example, a minor variation of roaring bitmaps would suffice.
However, another reply mentioned layer 7 firewalls, which can potentially understand DNS, HTTP headers, and apply regular expressions. Those would not be possible using this kind of naive data structure and would definitely require something like Z3.
Same kind of four-tuples, just with 128 bit numbers instead of 32 bit ones for the addresses.
If you wanted to do this for a gazillion rules and micro-optimize memory use, you could use indexed ip references in the memory representation, same way as vertices are addressed in GPU programming - would help for v4 as well.
I built a slack bot that uses this and network manager analyzer for tgw. So when someone asks why some network access doesn't work I can let my bot tell them why.
People get intimidated by the terms SMT or theorem prover but I want to emphasize this is actually a really simple process. Any time you have two sets of logical rules (often found in access control systems) you want to compare, translating them into Z3 is really quite easy. There are Z3 bindings in lots of languages, I used Python for the Teleport work: https://github.com/gravitational/rbac-linter
My friends and I have also used Z3 to solve various math puzzles. It's useful for checking whether a solution exists when you start modifying the puzzle to see how far you can push it. We used it for this puzzle, for example: https://www.nsa.gov/Press-Room/News-Highlights/Article/Artic...