UEFI is a great example of second system syndrome. In the BIOS world, more featu...

wolfgke · on Nov 18, 2017

> OpenFirmware was a much more elegant technology sitting around for the lifetime of modern x86 but intel had to be different.

Let's rather look at the reasing of the UEFI developers why they have a different opinion on Open Firmware (page 5 of http://www.uefi.org/sites/default/files/resources/A_Tale_of_...):

"Another effort that is similar in its role to the PC/AT BIOS-based platform occurred with Open Firmware (OF) [4] and the Common Hardware Reference Platform (CHRP). CHRP was a set of common platform designs and hardware specifications meant to allow for interoperable designs among PowerPC (PowerPC) platform builders. Open Firmware, also known as IEEE-1275 [4], has issues in that it is an interpreted, stack-based byte-code. Very few programmers are facile in Forth programming, and it is renowned as being “write-once/understand-never”, and having poor performance, and non-standard tools. Also, Open Firmware has a device tree essentially duplicating ACPI static tables. As such, the lack of Forth programmers, prevalence of ACPI, and the fact that UEFI uses standard tools and works alongside ACPI — versus instead-of — helped spell Open Firmware’s lack of growth. In fact, it is used on SPARC and PowerPC, but it is unsuitable for high-volume machines and thus prevent it from making the leap from niche server & workstation markets."

kev009 · on Nov 18, 2017

That is almost entirely FUD, every Mac before the intel switch had OpenFirmware. FCode was elegant in that it is cross platform, and the option ROMs can work across multiple uarchs. The only valid point is that Forth is obscure. I don't know how much that matters in reality due to how obscure working at that level is itself, as an OS dev you're working on the device tree and runtime services in a language like C. But I will grant benefit of the doubt since I'm an OS dev and not an option ROM dev.. and say, well, Petitboot is the logical endgame as the industry has made Linux uber alles and you're going to be writing drivers for it anyway.

snuxoll · on Nov 18, 2017

Also, talking about the OF device tree duplicating functionality in ACPI like it's a bad thing. ACPI is a huge shitshow, if transitioning from legacy BIOS to something other than UEFI happened maybe we could have killed it off and done something better.

With that said, UEFI isn't TERRIBLE - the complexity is hard to overlook and for all the benefits boot services among other things can bring it's rarely used by anybody but Apple.

wolfgke · on Nov 18, 2017

> With that said, UEFI isn't TERRIBLE - the complexity is hard to overlook

UEFI is not actually that complex (though not a jewel of tinyness). There are lots of features that are optional. So in principle it is possible to build a quite small UEFI implementation if desired. The problem is that most mainboard vendors deliver very bloated implementations. I actually can understand there reasoning: They only want to (barely) support one implementation. If a feature is left out, there will be customers that complain. On the other hand, if they leave it in, it "suffices" to add an option in in the UEFI configuration tool that enables the user to disable the feature.

justinjlynn · on Nov 19, 2017

well, to be fair, they generally buy implementations from third party vendors. Those vendors throw things in for competitive advantage because some other vendor footed the bill for implementation and "why maintain multiple builds or a kconfig file". It's a bit of a crap spiral. If the implementations were open source with a strong consulting component for integration you'd probably see less non-removable bloat.

wolfgke · on Nov 19, 2017

> If the implementations were open source with a strong consulting component for integration you'd probably see less non-removable bloat.

I believe about all UEFI implementations are based on the open source (though not copyleft) TianoCore implementation.

justinjlynn · on Nov 19, 2017

That's probably true. This likely means that almost nothing is upstreamed, so it may as well be proprietary software once it flows through to the manufacturers and us. Copyleft is an incredibly important concept and this is a good illustration of why "restricted" freedom for some can result in greater freedom for end users and developers. Freedom from proprietary software is just as important as the freedom to create it.

monocasa · on Nov 18, 2017

Also, if it was really an issue, you could just compile something else to FCode. Pretty much every non-native language is being compiled first to a stack language that looks a lot like a shitty version of Forth as the IR.

wolfgke · on Nov 18, 2017

> Pretty much every non-native language is being compiled first to a stack language that looks a lot like a shitty version of Forth as the IR.

It is not clear what you mean with that. I assume you want to hint that the CLR and the Java Runtime use a stack-based instruction set and are common compilation targets, which is true.

But there are lots of other virtual machines that provide a runtime that are not stack-based. For example the implementation of the Lua 5.0 runtime is register-based. An other example of a register-based virtual machine is Parrot.

fasquoika · on Nov 18, 2017

To be fair, just about all of the most widely used bytecode VMs are stack-based. The JVM, CLR, CPython and YARV are all stack based. You're basically describing the exceptions

wolfgke · on Nov 19, 2017

> To be fair, just about all of the most widely used bytecode VMs are stack-based. The JVM, CLR, CPython and YARV are all stack based. You're basically describing the exceptions

Be very cautious with this kind of statement. Dalvik (that is used on Android), LLVM (Low Level Virtual Machine) and Erlang's virtual machine are also register-based (I just forgot about them when writing the above post). So I would rather claim it is a 50-50. I could write something about the advantages and disadvantages of both approaches (stack-based vs register-based), but this would become off-topic.

exikyut · on Nov 19, 2017

> I could write something about the advantages and disadvantages of both approaches (stack-based vs register-based), but this would become off-topic.

But still very interesting. And this subthread has been discussing Forth, so I argue that at least an overview discussion isn't out of order.

As I understand it, stack-based software and hardware architectures have been shown to have poor performance when compared to traditional register-based approaches (I specifically read somewhere that even hardware has been shown to be suboptimal, but I unfortunately don't remember where, and the source didn't qualify if the stack architecture in question was pipelined or had other enhancements or if it was just a dumb basic 70s/80s design). In any case, modern (pipelined) register hardware doesn't particularly like the L1/L2/Letc thrashing that stack dancing creates.

Obviously with registers you have a bunch of switches that have theoretically O(0) access time regardless of execution machine state. At least that's how I mentally model it, at least superficially; I presume 0 becomes 0.nnnnnnnn for fluctuating values of nnnn depending on pipeline filled-ness and other implementation details. (And I use floating-point here just as an aesthetic representation of hard-to-quantify derivations from theoretical perfect efficiency.)

I feel like there's gotta be some kind of cool middle ground between stack-like architectures and register architectures that nobody's found yet, but I do wonder if I'm just barking up an impossible tree. Or is there any research being done in these areas?

The main problem I see with stack architectures is that it's effectively impossible to optimize (ie build a pipeline) for. Because if all you're dealing with is the top $however_many_things_the_current_word_pops_and_pushes items on the stack (which, to clarify, the hardware can't even know because that information is locked away in the implementation of the currently-executing word), well... you're in an impossible situation. For want of a better way to express it, the attention span of the implementation is inefficiently small.

Anyway, this is some of how I see it. What are your thoughts?

standupstandup · on Nov 19, 2017

If you'd like to learn about a CPU arch that is neither register nor stack based, watch the videos for the Mill and in particular the belt.

exikyut · on Nov 21, 2017

Ooh, alright then. Thanks.

wolfgke · on Nov 19, 2017

> > I could write something about the advantages and disadvantages of both approaches (stack-based vs register-based), but this would become off-topic.

> But still very interesting. And this subthread has been discussing Forth, so I argue that at least an overview discussion isn't out of order.

I think this whole topic is more complicated than the points you gave.

I start with an example: The JVM takes the stack-based instructions and JIT-compiles (in principle AOT-compiling is also possible) them into (register-based) native instructions of the CPU. For this of course lots of transformations etc. have to be done. So executing the stack-based VM instructions naively one after each other fits the CPU badly, but this does not matter - thanks to modern JIT compilers, which transform the code completely.

One clear advantage of stack-based VM instruction sets is that there are much less "parameters to decide about". If you work register-based:

- How many registers? 8? 16? 32? 256 (i.e. a large number that can nevertheless be reached by real, though artificial programs)? "Near infinite" (say 2^31 or 2^32)?

- What register sizes? ARM has only 32 bit (and in AArch64 64 bit) integer registers. x86 has 8, 16, 32 and 64 bit registers.

- Should one allow to interprete the lower bits of some large register as a smaller one? What shall happen if we load some constant into such a subregister: Will the upper bits be unchanged (if done on a real CPU this can lead to a pipeline stall), zero-extended or sign-extended?

- What about floating point registers: Should one be able to interprete them as integers (encouraged on SSE/AVX (x86), but dicouraged on ARM)?

- If we consider SIMD registers to be useful: What size?

- Do we want 2-operand or 3-operand instructions: 2-operand instructions have the advantage that the graph coloring problem that is used for allocating CPU registers can be solved in polynomial time, since the graph is chordal. This is also (before AVX) the preferred instruction format for x86. 3-operand instructions have the disadvantage that the graph coloring problem is NP-hard so that in practise often heuristics are used. 3-operand instructions are common on RISC and RISC-inspired processors (e.g. ARM A32, A64 instruction set; note however that I think T32 uses 2-operand instructions).

As I pointed out this really large design space forces you to make lots of inconvenient decisions. I think this was a problem of Parrot VM who introduced lots and lots of different instructions to their VM. So if you want to keep the VM portable over lots of architectures, a stack-based approach is more convenient (I don't claim "better"). This was - I believe - one reason why the Java Bytecode was designed stack-based.

On the other hand, if you do it the right way, register-based code tends to be be more compact and is simpler to transform into machine code. These are surely central reasons why a register-based implementation was chosen for Dalvik (Android) and the Lua VM.

On the other hand to run stack-based code fast, you typically have to do a lot more transformations to the code - which one would love to avoid, in particular for embedded/small systems. So in some sense one can argue that a register-instruction based VM is the more low-level approach for designing VMs - much more decisions to do (which are the best one tend to depend on the primary CPUs that you want to target), but less code transformations to do in the runtime.

exikyut · on Nov 19, 2017

> I think this whole topic is more complicated than the points you gave.

And now I've read your reply I agree. Thanks very much for the additional considerations.

I had no idea the JVM applies analysis to the stack VM state to turn it into register-based code. I realize it's a JIT, but I never really thought through the implications.

Regarding register vs stack - don't stack based systems also have to decide about stack-item size? I'm not sure how this works in practice but surely size considerations get factored in at some point.

Regarding the 2/3-operand instruction problem, this is very interesting but must admit I need to do some reading about graph theory. I do very vaguely understand it but for example https://en.wikipedia.org/wiki/Graph_theory doesn't mention the world "chord" anywhere.

This indeed is a complex problem, and thanks very much for illustrating the difficulty.

wolfgke · on Nov 19, 2017

> Regarding the 2/3-operand instruction problem, this is very interesting but must admit I need to do some reading about graph theory. I do very vaguely understand it but for example https://en.wikipedia.org/wiki/Graph_theory doesn't mention the world "chord" anywhere.

Concerning chordal graphs:

> https://en.wikipedia.org/w/index.php?title=Chordal_graph&old...

Concerning using graph coloring for register allocation:

> https://en.wikipedia.org/w/index.php?title=Register_allocati...

I remark that in particular on architectures that have lots of different register sizes and the capability to reinterprete parts as subregisters (such as x86) graph coloring is only some rough approximate of register allocation; here one has develop more complicated models.

exikyut · on Nov 20, 2017

Thanks! :) I'll admit this'll take me a while to get my head around, but it's a very interesting subject.

comex · on Nov 19, 2017

I don’t see how a stack-based architecture avoids the need to think about register sizes. You just get equivalent questions, like - what sizes of integers do you have primitive ops for? do you have separate pop/swap/drop/etc. for every possible size, and if not, what’s the standard size of a stack item?

wolfgke · on Nov 19, 2017

> I don’t see how a stack-based architecture avoids the need to think about register sizes. You just get equivalent questions, like - what sizes of integers do you have primitive ops for? do you have separate pop/swap/drop/etc. for every possible size, and if not, what’s the standard size of a stack item?

This is correct. But these are still less decisions, e.g.

- no number of registers,

- no reinterpretation of parts of registers,

- SIMD fits the stack-based model rather badly,

- stack-based VM instruction sets are typically of the kind "take two values from top of stack, do something with it and push back" (very inspired by Forth - but I don't know much Forth), see for example for the Java bytecode instructions (https://en.wikipedia.org/wiki/Java_bytecode_instruction_list...) or CIL instructions (https://en.wikipedia.org/wiki/List_of_CIL_instructions and http://download.microsoft.com/download/7/3/3/733ad403-90b2-4...), so no worry about 2-operand vs 3-operand instructions.

monocasa · on Nov 19, 2017

I said non-native, sort of thinking of LLVM. And to be fair, Dalvik bytecode is compiled from JVM bytecode, so a stack based language was used as an IR in that language stack.

foota · on Nov 18, 2017

I believe that many compilers have an intermediate step that looks like a stack based language.

wolfgke · on Nov 18, 2017

> I believe that many compilers have an intermediate step that looks like a stack based language.

I admit that in the 80th many compilers used something reminiscent of a stack based language for the code generation phase. The reason is that it is rather easy to write a code generator for an accumulator machine (do it as an exercise if you want).

But this is not how typical modern compilers look like. Just to give one point of evidence beforehand: Modern processors have many more general purpose registers (x86-64 has 16 and AArch64 has 32, though some have reserved purposes). So such code would be a waste of the possibilities.

A typical modern compiler looks like this (I am aware that your preferred compiler might have additional or somewhat different stages, but the following is typical):

Frontend (very language-dependendent):

  - Tokenize input
  - Parse input (or report syntactical errors) to generate parse tree
  - Generate Abstract Syntax Tree (AST)
  - Do semantic analysis/type checking
  - Transform program in Static single assignment form (SSA form)

Middleend (not much depdendent on language or machine)

  - Do optimizations on SSA code (e.g. liveness of variables/dead code elimination, constant propagation, peephole optimizations, perhaps replace whole algorithms)

Backend (very machine-dependendent)

  - Do machine-dependent optimizations
  - Do register allocation
  - Generate machine-dependent code

As one can see: A stack-based intermediate step does not appear hear (and is to my knowledge uncommon) - instead transformations on code in SSA form are common.

jcranmer · on Nov 18, 2017

Compiler IRs are often based on reflecting expression DAGs rather than manipulating a finite register set or an infinite stack.

exikyut · on Nov 19, 2017

Manipulating an infinite stack sounds like a really big world of pain.

I realize you're only ever poking around near the top, but if the system has the chance to grow infinitely, well, it will.

nickpsecurity · on Nov 18, 2017

Which is exactly what a high-assurance security project did from Java to Forth. For Open Firmware and commercially available, too.

https://www.slideserve.com/brasen/efficient-code-certificati...

feelin_googley · on Nov 18, 2017

fgen(1) FCode tokenizer

ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-release-8/src/usr.bin/fgen/Makefile ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-release-8/src/usr.bin/fgen/fgen.1 ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-release-8/src/usr.bin/fgen/fgen.h ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-release-8/src/usr.bin/fgen/fgen.l

gioele · on Nov 18, 2017

> Open Firmware, also known as IEEE-1275, has issues in that it is an interpreted, stack-based byte-code. Very few programmers are facile in Forth programming

And ACPI is based on an interpreted, non stack-based bytecode, AML. [1]

There is one order of magnitude or two more Forth programmers than AML/ASL programmers.

[1] http://wiki.osdev.org/AML

kabdib · on Nov 18, 2017

I hate FORTH as a general purpose environment, and I've seen a lot of FORTH-based train wrecks written by FORTH fanatics. But FORTH really, really shines in the boot environment, it's a fantastic tool for setting up hardware and gluing things together.

As for the arguments that it's got poor performance, I'd like to point out that the BIOS / UEFI based systems on most of the systems I work with take minutes to POST, tens of minutes for the bigger machines, and I think it's a clear case of not looking critically at their own work.

jandrese · on Nov 18, 2017

My favorite is waiting 8-10 minutes for UEFI to finish whatever the hell it is doing to have exactly one second to hit the appropriate F key to get into the boot options.

exikyut · on Nov 19, 2017

On servers, I presume?

userbinator · on Nov 19, 2017

That's probably due to a very thorough memory test, and it's not particular to UEFI either --- I have worked with older servers, with regular BIOS, that do the same thing. There may be an option to disable it.

kabdib · on Nov 19, 2017

Memory tests are a big part of it, but definitely not the only really slow component.

(The 10+ minute boot time on our bigger servers already is the fast version of the memory test. I've never turned on the more exhaustive one).

exikyut · on Nov 19, 2017

Yeow!

What are the other tests? I mean, there's the CPU, memory, the disks, PCI... surely you don't have SCSI, option ROMs should be quick... I'm genuinely curious.

What sort of server is it, for reference? (Well, I'm really just wondering how much RAM it has.)

And now I'm wondering how long the longer test takes!

jandrese · on Nov 20, 2017

IBM x3550 M3s were really bad about this. They aren't crazy big servers--dual CPU, 32GB RAM, just a couple of disks. UEFI boot took forever for no discernible reason.

exikyut · on Nov 19, 2017

Makes sense. I should hope there is such an option, for OS/boot debugging etc!!

wmf · on Nov 18, 2017

That's pretty revisionist considering that OF predates ACPI. Just like all other stories of path dependence, Intel backed themselves into a corner and now they're stuck there forever.

wolfgke · on Nov 18, 2017

> Just like all other stories of path dependence, Intel backed themselves into a corner and now they're stuck there forever.

Beside the "well-known" implementations of UEFI for x86-32, IA-64 (Itanium) and X64-64, there also exist official UEFI implementations for ARM AArch32 and ARM AArch64. According to https://en.wikipedia.org/w/index.php?title=Unified_Extensibl... there also exist inofficial implementations for POWERPC64, MIPS and RISC-V.

One consumer example where UEFI on ARM processors was used was the ARM-based Windows RT laptops (they were not very successful in the market). Much more importantly an UEFI implementation is required for ARM's "Server Base Boot Requirements (SBBR) - System Software on ARM® Platforms" standard:

> http://infocenter.arm.com/help/topic/com.arm.doc.den0044b/DE...

(read section 3 and appendices A to D). So about every ARM server uses an UEFI firmware.

kev009 · on Nov 18, 2017

Going back to the intel FUD paper, if you consider volume shipment as success as in their document, uboot looking thing with FDTs are the most successful paradigm by orders of magnitude.

The kind of systems using FDTs aren't going to switch I don't think. I have to imagine that ARM64 UEFI was done for marketing and by clueless product management. The CPUs and integration of ARM servers have been so pathetic, nobody was buying, and they were fumbling around at this level while ignoring the elephant in the room. TX2 and Centriq are the first realistic implementations for general purpose servers, and now they are unfortunately saddled with UEFI and ACPI.

This was a rare 2-3 decades kind of mistake. Not a lot of software gets that privilege.

yuhong · on Nov 19, 2017

It wasn't long ago that there was another thread: https://news.ycombinator.com/item?id=15695903

raverbashing · on Nov 19, 2017

You know what else is a stack-based byte code? JVM

They could have had an easier to read language on top of that VM (could be Python/Lua or something else)

xorgar831 · on Nov 18, 2017

Fair enough, but why is there no standard shell or alternative interpreter?

wolfgke · on Nov 19, 2017

> Fair enough, but why is there no standard shell or alternative interpreter?

There actually exists the UEFI Shell, but not every UEFI implementation has it built in. For example the UEFI implementation that Intel provides for the Minnowboard Max (now EOL) and Minnowboard Turbot does provide an UEFI Shell.

If the producer of the mainboard/laptop has not built in the UEFI Shell, there still exist options to start an UEFI Shell binary as an ordinary UEFI application (just as a bootloader etc. is also just an UEFI application):

> https://superuser.com/a/1057594/62591

In principle it should even be possible to write an alternative shell for UEFI and start it this way, but I am not aware that someone has written one.

drewg123 · on Nov 19, 2017

I recently suffered through a firmware update done via the UEFI command line, and it was incredibly painful. Every time I need to use the UEFI command line (normally for a firmware update) I want to scream.

BTW, Don't forget about DEC's "SRM" firmware, that was quite nice. It was simpler than open firmware, and made a hell of a lot more sense than uefi. After almost 15 years away from DEC alphas, I still remember most of the commands. SRM's "lfu" beats the hell out of EFI updaters I've used. And I fondly recall that you could break into the SRM and look at the current value of registers, and even (if you were running DEC Unix) force a crashdump. Kind of like ddb on FreeBSD, but built into the firmware.

wolfgke · on Nov 19, 2017

> I recently suffered through a firmware update done via the UEFI command line, and it was incredibly painful. Every time I need to use the UEFI command line (normally for a firmware update) I want to scream.

What was so painful about updating the firmware via the UEFI Shell? I do/did it all the time this way for the diverse Minnowboard variants and cannot claim that it was painful.

drewg123 · on Nov 19, 2017

In general, because the EFI shell sucks and was designed by committee. On a machine with 36 disk drives, it was not easy to find the drive with the UEFI partition that held the firmware update. That's why I mentioned SRM -- the firmware updater built into SRM would try all the drives attached to the system.

pera · on Nov 18, 2017

Yes! I wish more people knew how cool and powerful OpenBoot was, it even had an interactive debugger that included a Forth decompiler. I spent so much time playing with it years ago on my old SPARCstation... I would love to see something like that included in some new SBC :)

coling · on Nov 18, 2017

I agree. It was very empowering to be able to hit stop-a and get a backtrace with the kernel, user code, etc.

tinus_hn · on Nov 19, 2017

Also so interesting to disconnect your terminal and see your server hang until someone tells it to continue

mjevans · on Nov 20, 2017

I've been meaning to write this up more properly for a long time, and this reminds me of my pains in this area.

This is more of an off the cusp draft of my ideas and opinions in this area.

To start with, it would be REALLY helpful if there were a physical mode switch for maintenance tasks. (On desktops I think it should be like an 'ignition key', only with the key being a generic shape that could be substituted by a flat-head screwdriver or any custom shape someone might buy as a keychain thing at a gas station/convenience store. For thinner portable devices maybe an SD card where all the data pins are shorted to ground or a similar data cable. Whatever it is, the standard should be dead simple and generic. )

Lacking an above maintenance mode, the system should assume it's in such a mode by default (current state of affairs).

When in that maintenance mode the basic firmware should:

>> If /only/ bootable media with previously authorized keys is present and the last boot attempt was less more than 5 min* (configurable time per boot profile) ago: attempt to boot.

>> If the last boot was less than that time, follow secondary boot options (might be a local checking stub, might be a URI to offer a list of PXE style blobs from, etc * * ).

>> If bootable media is present, but has a 'signing key' not in the local approved cache...

>>>> Prompt the user with a description of the issuing key, the fingerprint of said key, cross-reference it against previously installed authorities to see what their opinion of the key is.

>>>> Allow the operator to install the indicated key and/or CA (and online cross-check / maintenance image locations) in to the trusted keys storage area.

In all cases the end user should be the highest authority and trusted operator.

The operator SHOULD be able to remove (or at least disable) (even all) pre-installed CA/etc.

The operator MUST be able to add any additional authority they designate.

For TPM/DRM things, operator added authorities /may/ maintain strong signing verification for loaded code (or not), and the 'strong signing' flag would only indicate if the path was trusted or not.

The way TPM/DRM would actually work is to verify the integrity from the ground up, in it's side channel. The host system MUST also be able to operate in a mode where that side channel is completely disabled, or replaced with a non-DRM-trusted side channel (corporations might want to use such equipment for remote support operations).

* * This is intended to be an 'advanced' system checking and repair utility. It should do something like collect a list of 'core' things that it thinks are installed, and check that against the online repository. It would operate by downloading lists of files (per installed thing and update set), then compile an in-memory database of those items and their signatures in at least two hash algorithms and verify that the local storage matches. If there's a difference it should offer to download the expected files instead, and ask the user if they want to upload the different, possibly infected, files.

mtgx · on Nov 18, 2017

> but intel had to be different

Or maybe they used it for the same reason they built the ME.