More in the series of bizarre UEFI bugs

brfox · on Nov 15, 2012

This same exact thing happened to me earlier this week when installing a Ubuntu Desktop 12.04 LTS 64 bit on a new Lenovo ThinkCentre box. The only thing I could do was go into the bios and switch it to Legacy instead of UEFI and then reinstall Ubuntu. Then it worked fine.

See more evidence of this: http://askubuntu.com/questions/141879/error-1962-no-opertati...

http://askubuntu.com/questions/91484/how-to-boot-ubuntu-from...

saljam · on Nov 15, 2012

This is why more vendors should use Coreboot[1]. It's open source, so this sort of rubbish can be fixed. And it's also much more flexible.

I think the new Chromebooks use it. This is how they achieved their “instant-on” feature. I can't wait for others to follow suit.

[1] http://www.coreboot.org/

belorn · on Nov 15, 2012

So what is the problem they are trying to solve by looking at the boot entry description?

If you want to detect malware, I dont think looking up the boot entry description will do it. Malware do not say "I am a malware, press me to be infected!". It is however a excellent way to prevent competition between operative systems.

zokier · on Nov 15, 2012

I think they might have attempted to work around some bug in Windows boot loader, and then it kinda stuck.

belorn · on Nov 15, 2012

Do you mean there are multiple windows boot loaders, each which a unique description string, and some might need special attention from the UEFI?

Is it like, "Windows Boot Manager" and "Windows Server Boot Manager"? Do windows (7/8) have a different description than say windows server, or is there a version number hidden somewhere? It would be fun to read the details of such bug :).

luu · on Nov 15, 2012

This happens everywhere, unfortunately. I work for that other x86 CPU vendor that isn't Intel or AMD. Even though there are CPUID feature flags that identify which features a processor implements, many developers determine compatibility from the vendor ID. If we're lucky, we'll be treated as some generic 386 and we'll get to run some horribly unoptimized code [1]. In many cases, the driver, platform, or OS will error out and die [2].

[1] http://arstechnica.com/gadgets/2008/07/atom-nano-review/6/. Scroll down to the third graph, if you want the tl;dr.

[2] http://code.google.com/p/nativeclient/issues/detail?id=2508. I don't post that because it's a particularly egregious example. Even though I don't think the reply makes sense, it's actually more reasonable than most responses. It just happens to be public, because the exchange happened on a public bug tracker.

It's fairly easy to fix this sort of thing with a patch, à la Raymond Chen [3], but, for legal reasons, we can't just hand out patches to every program that incorrectly determines features from the vendor string. It often takes over a year to convince a vendor to issue a patch for its driver or OS, even when we have a benign patch we're using in lab to work around the issue, so we can do compatibility testing (we test pretty much everything) [4]. That's if we're lucky enough to get a vendor that wants to fix it; we often just get the runaround indefinitely. I can recall one case when no printer driver from a certain manufacturer would install on a machine with one of our CPUs, even though that same vendor was selling multiple models that used our CPU.

[3] http://blogs.msdn.com/b/oldnewthing/archive/2012/11/13/10367...

[4] I haven't done lab debug for a while, but the last bug I can recall hearing about was a case where, if you had two webcams recording and playing back to the screen while watching a Blu-ray DVD and running an obscure benchmark from the 90s that wasn't even used in the 90s, the machine would hang approximately once every three days. I don't know where we find the mandmen who come up with these tests.

The funny thing is, we had a feature in our part that we suspected was buggy, and disabling that feature caused the fail to go away (or at least occur incredibly infrequently), but you can't ship a part unless you're really absolutely sure it's not going to hang on real customers, so someone had to track down to the root cause and capture it simulation. Just because disabling that feature meant the bug didn't show up didn't mean that feature was the cause. It could have been that disabling the feature just changed the conditions so that bug became less likely, and only popped up once a year, or maybe needed five webcams to expose, or who knows what? IIRC, it took someone two months to find the exact issue.

pilif · on Nov 15, 2012

The reason given by the Chromium people for using the Vendor string for security reasons strikes me as utterly backwards for multiple reasons:

1) They quote incorrect x86 implementations and then even quote errata documents from both vendors while still also only trusting these two vendors, despite the fact that they just quoted them as en example for broken implementations.

2) By using the vendor string, they are ensuring that the intentionally broken implementations (those forging the ID string) get trusted, while the honest guys don't.

3) What they do in the end is encouraging CPU vendors to just forge their vendor string (see above).

Whatever reason they might actually have, I really doubt that trusting CPUs by stated name of the vendor does not at all increase the security of the system.

raverbashing · on Nov 15, 2012

Really Google, really?!

Check the CPU capabilities for fscks sake! This is CPU features 101. Read the manufacturer's manual, they point that out.

Sorry for the aggressive tone, but this is really idiotic. Checking CPU string is worse, more buggy and less compatible EVEN among different CPUs from the same vendor!

"Because our system is potentially vulnerable to incorrect x86 implementations"

Fine, check the capabilities FIRST, then check the string. Not the other way around.

Same thing for the "firmware developers" who check for Windows and Red Hat at the string. Someone needs to buy a large batch of these and return them as being defective.

pdw · on Nov 15, 2012

Regarding point 1), they trust AMD and Intel because they receive errata documents from these vendors and can therefore learn about and work around CPU bugs that would affect them.

It still seems an extremely paranoid stance to take.

mjg59 · on Nov 15, 2012

Worth noting that this isn't limited to Windows - for a long time, Linux was specifically checking whether a CPU was an Intel before using the ACPI fixed function hardware interface for CPU scaling. To be fair, that's partially because the spec actually says that this interface is CPU vendor-defined, but it's probably reasonable to give the benefit of the doubt when they're setting the same cpuid flag. There's a lot of ways to screw up compatibility.

yuhong · on Nov 15, 2012

Plenty of examples here (for example, some 64-bit OSes including Windows has a hardcoded list of CPU vendors it will run on!): http://www.agner.org/optimize/blog/read.php?i=49

pserwylo · on Nov 15, 2012

Matthew gave an awesome talk on UEFI at linux.conf.au 2012 [0]. A thoroughly entertaining and informative talk on issues including secure boot, but also other issues with UEFI. It even got voted one of the best four talks at the conference, so he did the talk again at the end of the conference. This was great because I missed it the first time.

My favourite quote from the talk was:

"Files contain code, [and] code, as we all know, contains bugs. Always. So from this we can conclude that UEFI contains bugs. This shouldn’t surprise anyone, other than the Linux kernel which obviously contains no bugs at all ever." [1].

[0] - www.youtube.com/watch?v=V2aq5M3Q76U (Keep in mind this was in January, when there was still a lot of uncertainty about the UEFI Secure Boot/Linux situation).

[1] - http://www.techrepublic.com/blog/australia/untested-buggy-ue...

yew · on Nov 15, 2012

You see this sort of thing in traditional BIOS systems all the time. Especially ACPI stuff. And when someone tries to fix it they usually just make it worse. There's a reason so many Linux kernel installations are configured to advertise themselves as '!Linux'.

raverbashing · on Nov 15, 2012

There was a development philosophy failure as well in the part of Linux, that was corrected by Linus.

No breakage of existing functionality is accepted anymore when fixing ACPI bugs.

ACPI bios developers are still a thousand monkeys banging at a keyboard apparently. (And yes, I have some experience in trying to work with them)

anonymous · on Nov 15, 2012

So the fix is to write "Windows Boot Manager" as the descriptive string on affected models. Would that be a problem?

pdw · on Nov 15, 2012

If you want to do secure boot, Microsoft won't sign your boot loader if it claims to be "Windows Boot Manager".

mjg59 · on Nov 15, 2012

It's the OS installer that sets the string, not the bootloader, so it's not a problem for signing.

mjg59 · on Nov 15, 2012

Yes, because it's a user-visible string. It'll appear in the firmware boot menu.

ChuckMcM · on Nov 15, 2012

Wow, sort of like the User Agent string in web compatibiltiy. Such a sad sad thing.