Creating an autonomous system for fun and profit (2017)

davisr · on June 24, 2023

Don't use Cisco equipment; they put (quite sloppy) backdoors in their products. Absolutely zero trust with them.

Snowden: The NSA planted backdoors in Cisco products --- https://www.infoworld.com/article/2608141/snowden--the-nsa-p...

Backdoors Keep Appearing In Cisco's Routers --- https://www.tomshardware.com/news/cisco-backdoor-hardcoded-a...

Sinister secret backdoor found in networking gear perfect for government espionage: The Chinese are – oh no, wait, it's Cisco again --- https://www.theregister.com/2019/05/02/cisco_vulnerabilities...

sneak · on June 25, 2023

The idea that the network is trustworthy is outdated. Assume all network connections, including those through your own equipment, are under surveillance. You shouldn't be trusting your network equipment to be not-backdoored in the first place. (Never mind that the router on the other end of the cable isn't yours and is untrusted also, whether Cisco or not.)

CPUs have AES key schedule accelerator instructions for a reason now. Encrypt and authenticate your traffic.

davisr · on June 25, 2023

I respectfully disagree. Are you suggesting everyone just rolls over and forgets about making decisions to the best of their ability to ensure security? The switches on the outside of your network do not matter. The traffic doesn't matter.

If you lose trust in your own switching equipment then it's all over. Management network? Compromised. Segmented traffic? Compromised. IPMI/BMC interfaces? Compromised. Anybody else's malformed traffic could breach your defense, and breach the very sanctity of the traffic your network is spitting out. It doesn't matter if your computer encrypts its traffic because a breached switch can just silence it.

A company selling switches/routers/firewalls should _not_ have these liabilities, and as these liabilities are known, nobody should buy Cisco equipment, ever. Buy the equipment that you know is the safest. Don't just give up and roll over!

sneak · on June 25, 2023

> If you lose trust in your own switching equipment then it's all over.

What is all over? The security boundary is when packets leave the machine with private information. VLANs and management networks and whatever aren't it. Proper security assumes an insecure network and uses cryptography for privacy and authentication.

teleforce · on June 25, 2023

Internet packet switching architecture is an attempt to provide reliable networks over unreliable (low cost) components, and very successful on that.

Similarly, network or cyber security should be able to provide secure networks over insecure (untrusted) components.

dang · on June 24, 2023

Discussed at the time:

Creating an Autonomous System for Fun and Profit - https://news.ycombinator.com/item?id=15727115 - Nov 2017 (16 comments)

(p.s. reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)

hamandcheese · on June 24, 2023

> and its biggest downsides are it's size and power, which are both not that big of issues since I've got a whole 44U rack for just a few servers and I don't get billed for my power usage.

I was surprised to read this. I was looking in to colocation services (for less than a rack) and everywhere I spoke to, including Hurricane Electric, included a set number of amps (which I assume is at 120V?).

Specially, HE offered me 2 amps with 7U of rack space. That seemed really low to me, just one of my 2U servers with a lot of hard drives idles at around 100W or just under 1A and easily exceeds 2A when it's really working (which admittedly is rare, it mostly idles).

I didn't follow up to see how that is actually metered. I'd love to hear about other folks experiences with collocating - is this common?

phirephly · on June 24, 2023

The base power with any colo space is going to be minimal. You typically come back and spec out what additional power you want with the rack a la carte.

phirephly · on June 24, 2023

This article is also essentially available as a podcast. https://oxide.computer/podcasts/on-the-metal/kenneth-finnega...

cantaloupe · on June 24, 2023

The article mentions that the Cisco router used is limited to a million addresses, which would be exceeded in “2-3 years.” Looks like the author got at least double the life out of the router, because the internet is just approaching one million BGP entries now!

https://bgp.potaroo.net/bgprpts/rva-index.html

phirephly · on June 24, 2023

I only got a few years out of it. I'm running an Arista 7280SR-48C8 now.

The problem is that the million tcam entries are split between IPv4 and IPv6, so I really ran out of space.

cantaloupe · on June 24, 2023

Bummer! Didn’t realize the graph was not for not IPv4 and IPv6. Have you done anything fun with the AS or had an opportunity to say “Luckily, I do have an AS!” in a time of need?

phirephly · on June 24, 2023

I started an Internet Exchange Point adjacent to it, use it to host mirror.fcix.net, and got a second ASN to build the anycast ns-global.zone service

rlaager · on June 25, 2023

Is the map on https://ns-global.zone/ actually the MicroMirror map? There seem to be a lot more dots on the map than in the traffic graph. If that map is accurate, I’m curious where the Minneapolis-ish node is (network wise). I’m not seeing 23.128.97.0/24 via MICE.

phirephly · on June 25, 2023

The map is a latency heat map. I should probably replace it with a POP map

derefr · on June 24, 2023

Can someone explain why loading a 1MM-route BGP table onto a network switch is a "hard problem" that requires fancy hardware to solve, rather than something that even commodity hardware today is capable of?

Presuming you do your IPv4 and IPv6 routing separately...

For IPv4, an interval-treemap from uint32-pair intervals to uint8 output ports fits into the default memory config of a PC from 1994; and each lookup into said tree resolves in nanoseconds, even on a machine of the era — esp. for tree-node pages that are hot in CPU cache.

And for IPv6, the tree could grow a lot larger, since the intervals are, per se, "uint128"-pairs... but there just aren't that many extant IPv6 routes yet, so the table is actually small in practice.

What are the constraints on the problem that I'm missing?

dfox · on June 24, 2023

6500/7600 is weird beast where the same HW architecture (and well 4 generations of it that are to some extent compatible although internally very different) are used as both high port density ethernet switch (6500) and as an router (7600, the only meaningful difference in the HW is that 7600 is painted white). In almost all configurations all the forwarding decisions are done in hardware, which involves somewhat expensive memory chip that Cisco calls TCAM (Ternary Content Addressable Memory), essentially an SRAM array interspersed with comparators that can find a FIB record matching the destination address with most specific mask in one cycle. In the actual implementation it is not a full TCAM and involves some kind of hashing and lookup takes multiple cycles, to make it not ridiculously expensive, but still it is not priced like your typical off the shelf 6T-SRAM (which by itself is orders of magnitude more expensive than random high volume DDRx SDRAM).

Another thing is that the thing is somewhat old and designed by Cisco who will not go out of their way to produce something that makes it obvious that their product can be replaced by a x86 box running Linux And good luck making a x86 box that has 720Gbps bandwidth and 144 ethernet ports. There is a question of exactly what is the real world practical application for 48 port gigabit linecard (there even is a PoE option, 6500 in right config can prowide kilowatts of PoE) in a router that can speak BGP, but well, you can build such a thing from 6500.

derefr · on June 24, 2023

The thing I was imagining replacing here is the supervision card alone (the sup720-XL), not the entire box. The supervision card doesn't need 720Gbps bandwidth; it's just spitting routing decisions, not entire packets. (And I would bet that established L3 flows cache their routing decisions in per-ASIC channel descriptors for short TTLs; so it's probably not even being prompted for a routing decision for every single packet, either.)

I assume the supervision card is prompted over some wire protocol by the ASICs in the switches for routing decisions, and responds back to them with a predictable delay. To achieve parity with the existing supervision card, it "only" needs to emit 30MM one-byte(!) decisions per second. I.e. a top-line input rate of 3840Mbps (for IPv6), and a top-line output rate of 240Mbps.

Basically, it confuses me why you can't slap such a "supervision card" together by taking a modern 8-core single-board computer that can fit the entire routing table into L2 cache on each core, and has a PCI-e socket; plugging an Infiniband card or whatever into it; and then running an RTOS on it.

Heck, when you think about it, SBCs are so cheap compared to a single used sup720-XL, that you could cluster them inside your router, with each supervision shard taking routing-decision load from 1/Nth of the ASICs.

dfox · on June 24, 2023

There are different modes in which 6500 can operate. The original 15Mpps mode (30Mpps is architecturally the same thing at higher clock speed) involves the line card arbitrating for a shared parallel bus on the backplane, transfering relevant data through this bus to the sup and after getting the result it transmits the whole packet over another shared parallel bus to the egress line card. In this mode the actual data do not traverse the sup card. In the newer dCEF modes there is a switching fabric of point-to-point source synchronous links and the crossbar switch of this fabric is physically located on the supervisor cards, so the “720” in “sup720” really means that there is 720Gbps of switching bandwidth on the card (although it is not exactly tightly coupled to the rest of the card that implements the “supervisor” itself, to large extent it is colocated on the same card due to physical convenience of that design, IIRC it is actually implemented as a mezzanine board).

At ~400Mpps doing the routing decisions in software is going to be a significant bottleneck even with some kind of modern SoC. But OTOH, today you can design the whole thing in bunch of other ways, as it is significantly easier to implement something with large amount of separate RAM blocks connected through some random arrangement of buses and crossbars controlled by few tightly coupled CPU cores as a one big ASIC (or even FPGA).

But all that assumes that the thing you are building is functionally an L2 switch, maybe designed with ridiculous amount of L3 acceleration (which is exactly what 6500/7600 is). As a point of reference C7200 is a router roughly contemporary to 6500/7600 and architecturally it is just a MIPS-based SBC with six PCI slots.

phirephly · on June 24, 2023

256 next hops isn't enough. Typical ASICs support 20,000 to 160,000 next hops FECs.

Cisco tried caching routing decisions from non-line rate routing engines in the 90s, and the industry learned the lesson that it's a bad idea. Caching works until you overflow the cache for some reason, and then the box completely falls over as it thrashes.

crote · on June 24, 2023

A Cisco Catalyst 6506 can handle 330 Mpps, so you need to have a guaranteed lookup time of about 3ns.

This router was released in 2005, back when a CPU was lucky to have 2MB of L2 cache which had a 10ns access time. So no, you can't just have the routing table in cache. Considering a random read from memory takes in the order of 200 nanoseconds, you're not going to be able to handle that with commodity hardware.

It might be doable in 2023 - but 2023 routers are able to handle way more packages too.

dfox · on June 24, 2023

Cisco specifies Sup720 as being capable of “up to 400Mpps*” routed IPv4 with the footnote implying that this is in dCEF mode. With dCEF, each linecard has its own forwarding logic, so the lookup time deadline has to be multiplied by number of linecards (ie. 4 for 6506 chassis). One implication of dCEF is that you end up replicating somewhat expensive ASICs across the system to gain more performance, which gets real expensive real fast.

Not that it changes much about your point.

phirephly · on June 24, 2023

When you want line rate forwarding across several Tbps of front panel ports, you need the packet pipeline to be able to make all the routing decisions without involvement from the OS. 8Bpps just doesn't give you time to be able to walk any kind of data structure in memory.

Running full internet tables on a x86 server where you can only get a few Gbps up to maybe a few dozen Gbps is much easier.

cookiengineer · on June 24, 2023

Isn't this what BPF and XDP try to target with their SmartNIC offloading, like the ones supported from Metronome?

phirephly · on June 25, 2023

Sure. And switch ASICs hang off a CPU as a PCIe device, so the line between smartNICs and low end ASICs start to blur. The key is hardware acceleration of some sort.

alfons_foobar · on June 24, 2023

Speed.

If your want to forward traffic at line rate (think 10/40/100 gbps per port), having the OS handle packets becomes a bottleneck.

rigidbus · on June 25, 2023

I'd like to assign a unique IPv6 address for each user of my service. Since I'm in Australia I looked to APNIC but their pricing is a bit intimidating for a side project. I'm primarily after stable addresses so that my users never have to reconfigure anything if the underlying infrastructure (Vultr to start with) changes. What options should I be looking at?

xfer · on June 25, 2023

Well, no such thing as stable addresses since you don't really own IP addresses, you are leasing from RIR. But, if you are fine with APNIC allocation, see if RIPE PI Allocation is sufficient for you, they are much cheaper(RIPE charges $50/yr), add some LIR cost, total would be <$100/yr.

rigidbus · on June 25, 2023

Yeah I'm aware you don't really own IP addresses just as you don't really own domain names. In both cases there is degrees of stability you can expect though. I'll look take a look at RIPE, thanks.