There isn't much reason to do so. Additional resources needed to support 16bit modes probably does not exceed few hundred transistors of logic and bunch of words of microcode. The data path is completely same, there are only few control differences (loads to selector registers set offset=value<<4; limit=0xffff; size=16bit instead of consulting descriptor tables).
On the other x86_64 CPU can be significantly simplified by riping out both 16bit and 32bit modes and thus all the selector/descriptor segmentation logic and replacing IDT with something simpler and less general.
The x86 processor basically has 4 modes: real mode, virtual 8086, protected, and 64-bit mode. There's no real distinction between the protected 16-bit and 32-bit modes, it's just a bit (a few, actually) in the descriptor saying what the default size is for operands. The 64-bit mode is truly distinct however; note that you can't access rax from 32-bit or 16-bit code, but you can access eax from 16-bit code.
It's worth pointing out that x86-64 still retains segments, although in a far more limited capacity (fs and gs are set up as a simple linear base and are used mostly for thread-local addresses).
I would say that there are five modes, the additional one is compatibility mode as it is sufficiently different from both 64b mode and protected mode.
To summarize:
- (un)real mode: selector loads directly set shadow offset, IDT is array of CS:IP pairs (unreal mode is jargon for real mode that somehow has "impossible" values in shadow descriptors, notably the state of i386 after reset)
- protected mode (CR0.PM=1): selector loads consult LDT/GDT, call gates are supported, IDT is descriptor table
- vm86 mode (EFLAGS.VM=1): selector loads directly set shadow register, IDT is descriptor table
- 64b mode (EFER.LMA=1, CS.L=1): shadow offsets and limits are ignored except FS/GS, selector loads do not check permission flags. IDT is descriptor table. Only 32b call gates are allowed and redefined to have extended format with 64b offset. EFLAGS.VM has no effect.
- compatibility mode (EFER.LMA=0, CS.L=0): segmentation is enabled, selector loads work as in protected mode. IDT is descriptor table. Only 64b call gates supported. EFLAGS.VM has no effect.
> I would say that there are five modes, the additional one is compatibility mode as it is sufficiently different from both 64b mode and protected mode.
I would tend to add SMM (which runs in ring -2) to the list. Also (here I am not aware of the details) what about the mode that hypervisors run in (e.g. ring -1)?
THe concept of rings is popular simplification of how the i386 protection works and mostly orthogonal to these five architectural modes, also the negative rings are purely abstract and have nothing to do with how the hardware really works.
i286 protected mode protection works by comparing privilege levels (~rings, 0-3) of various things on certain operations (mainly I/O and loading selector registers). For direct programmed IO to be allowed EFLAGS.IOPL has to be >= CS.DPL (also called CPL for Current or Code), for descriptor to be accessible its DPL has to be >= CPL, gate descriptors are special case that allows CPL to change by doing CALL FAR through them (JMP FAR is AFAIK also possible but not especially useful).
i386 extends this with another layer in the form of paging, amd64 long mode works only with paging enabled.
SMM is then another layer on top of that which is like having additional external MMU (and in fact the SMI logic usually resided in chipset and even today is conceptually external to the CPU). The architectural mode after entry to SMI handler is essentially normal real mode and can be changed by the handler into any desired mode with one slight caveat that IRET unconditionally returns from SMM without regards to what is on stack.
There are various extensions to i386 for hardware virtualization all of which works by somehow allowing creating process that perceives itself as running with CPL=0 while that not being true (in terms of rings it is more like the hypervisor runs in ring 0 and the guest kernel in ring 0.5 or something like that). This involves faking and duplicating some architectural state to the guest process on hardware level, the faked parts are necessary to make this possible while the duplication is needed for performance.
Various these mechanisms work at once mostly without regards to state of other layers. obviously there are some combinations that do not make much sense or are not directly achievable but to some extent only combination that is explicitly forbidden is long mode without paging (as EFER.LMA which actually controls it is forced by hardware to be EFER.LME & CR0.PG).
According to Wikipedia Windows 64 bit runs 32 bit apps by switching back to 32 bit mode.
> which switches the processor hardware from its 64-bit mode to compatibility mode when it becomes necessary to execute a 32-bit thread, and then handles the switch back to 64-bit mode.
"Compatibility mode" is sub-mode of long mode which has limited support for 16/32b i386-style memory segmentation (notably there is no support for vm86 code segments and gate descriptors have different format and can only reference 64bit "segments").
It is clear that AMD's intention for this was to implement just enough of segmentation in long mode to allow compatibility with segment-aware user-space code (eg. mixed 16/32b Windows userspace) but there does not seem to be any OS that actually uses 16bit descriptors in long mode (probably due to the mechanism being significantly different than i386's 16/32 compatibility). For compatibility with "normal" flat address space user space code the whole mechanism could be significantly simpler and have same "segmantation" behavior as 64b mode, have 32b EIP pushed on stack by CALL/RET and trap on anything that tries to load new selector anywhere (ie JMP/CALL/RET FAR, MOV whatever, ?S), even REX prefix could be supported in such an "limited compatibility mode" and we would not have to invent things like x32_abi (which is to some extent the same idea, although with different motivations, ie. run new ABI-incompatible 32b code in 64b mode to gain access to new registers and instructions while having 32b pointers and thus reduced memory footprint).
Isn't it more about just loading cs/ds/es segments with 32bit selectors? For example there's the "heaven's gate" trick on windows to sneak through the wow64 layer directly into the other modes - http://rce.co/knockin-on-heavens-gate-dynamic-processor-mode...
I don’t honk this is correct. Iirc moving to 64 but more is a one way street. Rather, there is a virtual 32 but compatibility mode provided in the 64 bot environment, and that is probably what is being referred to here.
On the other x86_64 CPU can be significantly simplified by riping out both 16bit and 32bit modes and thus all the selector/descriptor segmentation logic and replacing IDT with something simpler and less general.