As a (former) hardware engineer, I've worked on many projects where bugs have physical effects. This can range anywhere from amusing to seriously dangerous.
One such bug involved a mistake in the assembly diagram and silkscreen for a circuit board. The result was that a tantalum capacitor was installed backwards on a 12V supply rail.
Tantalum capacitors are polarized, and they fail in a spectacular way when reverse-biased. In this case, the supply rail could source upwards of 20A, so the fireworks were loud and impressive. Luckily the cap was easily replaced and the only permanent damage was cosmetic.
Hardest-to-troubleshoot bug:
In my subsequent return to the world of software, I worked on device drivers for network interfaces (among other things).
NICs frequently operate through a circularly-linked list of packet descriptors, which contain pointers to buffers in RAM where the NIC can DMA packet data. The hardware fills the DMA buffers and marks the descriptor as "used," and the driver chases the NIC around the ring, processing the packet data and marking the descriptors as free.
In testing, we discovered that under long periods (hours, usually) of heavy load, the system would occasionally freak out and stop processing packets. Sometime later, various software modules would crash.
Working backwards through the post-mortem data, I saw that the NIC would get "lost" and dump packet data all over system memory. I dumped the descriptor ring (tens-of-thousands of entries) and wrote some scripts to check it for consistency.
To make a very long story short, when the NIC was stormed with lots of 64b packets with no gaps, it would eventually screw up a DMA transfer and corrupt the "next" pointer in the descriptor ring. On the subsequent trip through the ring, the NIC would chase an errant pointer off into system memory and corrupt other system data structures.
Since hardware can DMA anywhere in RAM, the OS is powerless to stop it. The resulting errors can be ridiculously hard to track down and fix.
As a (former) hardware engineer, I've worked on many projects where bugs have physical effects. This can range anywhere from amusing to seriously dangerous.
One such bug involved a mistake in the assembly diagram and silkscreen for a circuit board. The result was that a tantalum capacitor was installed backwards on a 12V supply rail.
Tantalum capacitors are polarized, and they fail in a spectacular way when reverse-biased. In this case, the supply rail could source upwards of 20A, so the fireworks were loud and impressive. Luckily the cap was easily replaced and the only permanent damage was cosmetic.
Hardest-to-troubleshoot bug:
In my subsequent return to the world of software, I worked on device drivers for network interfaces (among other things).
NICs frequently operate through a circularly-linked list of packet descriptors, which contain pointers to buffers in RAM where the NIC can DMA packet data. The hardware fills the DMA buffers and marks the descriptor as "used," and the driver chases the NIC around the ring, processing the packet data and marking the descriptors as free.
In testing, we discovered that under long periods (hours, usually) of heavy load, the system would occasionally freak out and stop processing packets. Sometime later, various software modules would crash.
Working backwards through the post-mortem data, I saw that the NIC would get "lost" and dump packet data all over system memory. I dumped the descriptor ring (tens-of-thousands of entries) and wrote some scripts to check it for consistency.
To make a very long story short, when the NIC was stormed with lots of 64b packets with no gaps, it would eventually screw up a DMA transfer and corrupt the "next" pointer in the descriptor ring. On the subsequent trip through the ring, the NIC would chase an errant pointer off into system memory and corrupt other system data structures.
Since hardware can DMA anywhere in RAM, the OS is powerless to stop it. The resulting errors can be ridiculously hard to track down and fix.