"If your system does not use SERIRQ and BIOS puts SERIRQ in Quiet-Mode, then the weak external pull up resistor is not required. All other cases must implement an external pull-up resistor, 8.2k to 10k, tied to 3.3V"
Microcode is software running on your CPU. And many problems can be fixed with a microcode update, but not all - see the workarounds listed in the PDF I linked - note especially how many say "None identified".
Hard to say if something like this would be patchable via a microcode update. But regardless, back then your CPU didn’t run software to run your software, so a new hardware revision was in order.
They do tremendous amounts of validation. I believe random generation of input data is part of that.
Here's an old heavily cited paper from Intel on the topic; I'm sure their state of the art has advanced considerably in the intervening 17 years since its publication:
"crashme" is one venerable program that does this kind of fuzzing -- I managed to find a bug in a cpu with it once, which does not speak well of their QA department.
A former coworker was a QA manager at Intel. He said it was an explicit decision to cut back on validation and QA, which is why he wasn't at Intel anymore. The general feeling was that they had "overreacted" to the Pentium FDIV bug and needed to move faster.
Broadcom Sibyte 1250. I had pre-release silicon, and there was a known bug that prefetch would occasionally hang the chip. I wanted to have a little fun goofing around, so I modified crashme to replace prefetches in the randomly generated code with a noop. A few minutes later, I hung the chip.
If I identified the right erratum later, it was: if there is a branch in the delay slot of a branch in a delay slot which is mispredicted, the cpu hangs. It was fixed by making this sequence throw an illegal instruction exception. (It was undefined behavior already, I think.)
Interesting. And... the nesting you describe is messing with my head! :)
How did the erratum get fixed? Hardware re-re-release?
Definitely filed crashme away, sounds like a useful tool.
The Sibyte 1250 sounds cute. There's a "650000000 Hz" in https://lists.freebsd.org/pipermail/freebsd-mips/2009-Septem... although I'm not sure if MIPS from circa 2002 (?) was that fast (completely ignorant; no field knowledge - definitely want to learn more though).
I also noted the existence of the BCM91250E via http://www.prnewswire.com/news-releases/broadcom-announces-n...
sibytetm-bcm1250-mipsr-processor-to-accelerate-customers-software-and-hardware-development-75841437.html, which was kind of cool. I like how the chip is a supported target for QNX :)
Now I'm wondering what you were using it for. Some kind of network device? (I think I saw networking as one of its target markets.)
--
As an aside, I'm also very curious what HN notifier you use, if you use one. (I use http://hnreplies.com/ myself, but it's sometimes slow. I saw your message after 4 minutes in this case (fast for HN Replies); typing/research + IRL stuff = delay :) )
Our company was PathScale, and we were hoping that the Sibyte 1250 would make a great supercomputing chip. Opteron wasn't out yet, Intel was pricing 64-bit Itanium very high, and this dual-core Sibyte thing was projected to have a reasonable clock and it could do 2 64-bit fp mul-adds/cycle. We had Fred Chow and the GPLed SGI Compiler on our side. And the next revision of the chip had these great 10 gig networking features that I thought I could make work well with MPI, the scientific computing message-passing standard.
You can guess how it worked out: Sibyte was late, slow, and buggy. Even simple code sequences like hand-paired multiply-adds would start running at 1/2 speed after an interrupt. Our experienced compiler team was unable to get good perf on several SPECcpu benchmarks despite the code looking good. (Fred didn't have much hair left to pull out!)
Soon after we raised our A round we pivoted to using Opteron for compute, building an ASIC for a specialized MPI network, and Fred's team did an Opteron backend for the compiler.
The descendant of the network is now called Intel Omni-Path, and is on-package for Xeon Phi cpus.
I see. Always poignant to hear these kinds of stories.
I'm curious as to whether interrupt handling was done underneath the multiprocessing layer or whether interrupts were just hammering the pipeline design. (I assume by "an interrupt" you're referring to slowdown within the fraction of a second after a given interrupt occurred, within the context of floods of interrupts interspersed between instructions?)
Very cool to hear that Intel snapped up what you eventually managed to ship, FWIW - and that you were able to pivot in the way you did. Also interesting to hear about Opteron use in the field, my experience is only with tracking the consumer sector.
Unfortunately fuzzing ultimately has a random component
An instruction which accepts 2 registers and returns 1 register has a 192bit problem space to validate. This complexity is present in an instruction as simple as `add`.
As AVX2 instructions which accepts 3 registers and outputs 1 has a 1024bit problem space to validate.
This occurred in FMA3 with a ~512bit problem space.
Repeat for _every_ instruction (HUNDREDS). You can see how a few bugs slip though the cracks. The problem space is as large as some cryptographic functions!! I'm honestly surprised we don't see more of them.
The specific result produced by the data path is probably not very relevant in the case of a lockup. The control path involved with instruction decoding, register renaming, out-of-order execution, SMT, etc. is generally the cause of issues like this. With interactions between different blocks of the CPU and the size of some of the data structures involved, the full verification space is much, much larger.
I don't know about that. As I understand it, interesting things can happen if a large number data lines toggle at the same time, and this obviously depends on both the data and the control path. Huge space of possible states.
If you read the source (TechPowerUp is terrible why it isn't banned is beyond me) it is actually an _illegal instruction_ error. Which just crashes the current application.
Also if SMT is not disable the error doesn't occur. Also if the chip isn't over clocked the error doesn't occur.
Do CPU and GPU manufactures do any types of fuzzing?