hpa's proposal is a good one: Just disable busmaster for all devices on entry and let drivers sort it out in their initialization routines again.
There's not really a reason to have _any_ device bang on the memory before it's asked to by the current device manager (ie. BIOS or OS) and its drivers.
Sorry about that. It may work, but some hardware batches dmaand replays it when enabled. I want to check whether that solved the problem before really commenting on it. The other risk is that there are some drivers that (brokenly) depend on the current behaviour. It may be easier to fix it in the boot loader.
1. Hardware that batches DMA and replays it: a full device reset when the driver first initializes (before enabling bus mastering) should fix this, right?
2. Drivers that depend on this: which ones? Are they hard to fix?
Obligatory: I'm a kernel developer so I may be overly optimistic, but isn't this a security problem, at least in one sense? (That is, a malicious device may try to use bus mastering to attack a running kernel.)
However, what if the malicious device is actually a compromised but popular vendor who ships malicious firmware? That wouldn't be a stretch.
Resetting devices and fixing drivers isn't a huge win, but it's still an improvement. Like ASLR doesn't actively prevent buffer overflows - it just makes attacks more difficult.
IMHO UEFI is one of the dumbest things in a long history of dumb "innovations" in computer history. It adds complexity to a process which should be very simple and stripped down to the most basic needs: Booting a OS.
It's the implementation of UEFI/EFI/TianoCore that I find most problematic, and not the idea of having a console operating system.
I work regularly with servers that provide most or all of what EFI does, and within the console, though these servers provide it with considerably less confusion and hassle.
Having an embedded console can be very handy to have a functional operating system available in the firmware. Whether this is troubleshooting the server, or the boot process, or baseline server configuration without having to fire up an operating system or a diagnostic.
What's not so handy (with EFI) is the complete grab-bag user interfaces, nor the confusing array of consoles that can exist (the Shell, the menus, the BMC, and increasingly often a management widget), nor limitations around the callbacks. And the byte-code engine concept that was intended to avoid having to implement console (and boot) drivers for each new widget never really got traction.
Simply having boot drivers available as callbacks for the operating system would have been very handy for folks writing or porting an OS. Debugging in the bootstrap environment stinks.
IMHO, EFI just isn't a well-designed user interface. It seems to be a scatter-shot collection of pieces that were duct-taped together into a technology demonstration. And I'm not entirely certain the folks that originally built EFI ever intended the manufacturers present it to the end-users to use it as the primary console, either.
I am actually of the opinion that UEFI isn't "complex" enough—the best BIOS I've ever used is OpenBoot, with the Forth interpreter. Simple in some ways but very very flexible, more than enough to blow your while leg off. And yet more pleasant to use than anything else.
I've always had the impression that UEFI was a real improvement over the legacy x86 BIOS we've used for forever, but this article makes me wonder if it will just make things more complicated (especially for Linux and BSD users).
Does this article make UEFI sound more problematic than it actually is?
Ron Minnich of LinuxBIOS and CoreBoot fame has been pointing out for years that (U)EFI is a huge byzantine horror designed to be as proprietary and closed as possible.
UEFI is a codebase of comparable size to the Linux kernel, with rather less testing. Problematic bugs are somewhat inevitable, no matter how good the people implementing it.
Not to nitpick, but UEFI is a specification. You're probably thinking of EDK/EDK2 which is the reference implementation.
I think it's funny that you compare testing of the Linux kernel where automated testing is an afterthought with project such as LTP. Most kernel devs I know basically test everything manually. Sure, there are lots of developers which look over the code and use builds from git so releases are stable.
On the other hand, UEFI is a specification which defines most interactions between different components and there is a comprehensive range of automated tests provided by UEFI SCT (Self Certification Test).
Full disclosure, I've worked on EDK2. Personally, I hate it when software is used to limit freedom. My contributions to EDK2 are BSD-licensed and upstreamed.
While UEFI is a specification, every implementation I'm aware of is based on EDK. Practically speaking, all UEFI implementations are at least as complicated as EDK, and so are as complicated as the Linux kernel. The SCT is impressive, but only really ensures that an implementation gives appropriate results when well formed input are given to it. That gives very little indication of what the real world issues are in a code base, since we can generally assume that the real world is rather less competent at providing well formed input than a test suite is.
I've hit two real and severe bugs in EDK, one of which merely tended to crash any OS on boot and one of which allowed me to brick any hardware using the EDK BDS implementation to the point where recovery involved physical reflashing of the firmware.
Reality is that any codebase is, practically speaking, untested until it's been exposed to the real world. EDK, Tiano, and basically every real-world UEFI implementation haven't as yet. Putting them in the hands of people is going to find new bugs, and some of those are going to compromise operating systems. That's not a criticism of any part of the UEFI development process. It's a description of reality.
Actual: a pretty interesting debugging story involving a misbehaving driver DMA'ing things into memory where it shouldn't.