Article is interesting but short. It is pretty interesting question, but I am still missing the answer for it! (Probably: It does not use all instructions.)
> It is pretty interesting question, but I am still missing the answer for it!
In a very literal sense, the answer is trivially yes, because the compiler can be forced to generate any instruction sequence in an inline assembly block.
But assuming that isn't what you mean, there are plenty of instructions that the compiler will never emit when compiling CPU-independent C code (i.e. no inline assembly or CPU-specific compiler intrinsics). These fall into a few general categories:
1) Privileged instructions, like IN/OUT, WAIT, or SGDT. These instructions aren't even usable outside the kernel, and there's no way to represent their effects in pure C anyway.
2) Instructions which interact with specific hardware capabilities of the CPU, like the AES/SHA instructions, CPUID, RDTSC, or PREFETCH. Much like the privileged instructions, there's no way to represent their effects in C.
3) 8086 legacy instructions like ENTER, LOOP, or XLAT, which operate on 16-bit registers in highly inflexible ways, making them awkward for a compiler to generate code around. Most of them are also microcoded, making them slower than equivalent code sequences -- so there's no reason to use them anyway.
4) Instructions which interact directly with the flags register, like LAHF or PUSHF. C code doesn't have any concept of the flags register, and most compilers only generate flag-dependent instructions immediately after setting flags (e.g. in CMP/Jcc sequences), so they never need to save/restore flags.
5) Complex SIMD instructions like PSHUFB or PUNPCKxx which modern compilers can't reason deeply enough about your code to use effectively. (I'd love to be proven wrong on this one!)
There's definitely been a "death spiral" for some of these instructions.
Nobody used "LOOP", for example, so it tended to be poorly optimized, which made compilers even less likely to use it-- just use the seperate instructions that have similar effects. Eventually it actually became a liability to run it efficiently. (The famous "Windows 95 won't work right on a fast K6-2" bug is based on this problem-- LOOP was much less efficient on a Pentium than a K6)
>The famous "Windows 95 won't work right on a fast K6-2" bug is based on this problem
Wow that's pretty interesting on it's own. It turns out a program was trying to figure out long the LOOP instruction would take to run by running it 2^20 times and dividing it by the amount of time taken in milliseconds. On a Pentium 3 this took 17ms, but on a K-2 (and faster) this could take 0 milliseconds.
Then suddenly the REP prefix became much more useful than it used to be (around Nehalem I think?) to the point that REP MOV could be as fast or faster than SIMDified copy routine.
> Privileged instructions, like IN/OUT, WAIT, or SGDT. These instructions aren't even usable outside the kernel, and there's no way to represent their effects in pure C anyway.
It's possible to allow IN/OUT from outside of the kernel (FreeBSD has set_i386_ioperm(2) and io(4), Linux has ioperm(2) and iopl(2); other OSes may vary). You still wouldn't be able to emit IN/OUT from pure C, as you said.
And this works by setting bits in io bitmap at the end of a task state segment (TSS), which directs CPU to allow IN/OUT for specific ports in a process.
[So glad that I studied the system programming manual back then and now can look smart on a forum :]
I thought that Linux did not use the TSS though. I understand TSS is mandatory but my understanding was that Linux just creates a single TSS entry for CPU just to satisfy the requirement that there's something there.
Whilst researching my post, I saw something that said when switching tasks, Linux copies the soon to be current task's TSS relevant information into the TSS for the cpu. So it still can do all the TSS stuff, it just doesn't keep a TSS entry for all tasks.
> the compiler will never emit ... instructions like PSHUFB or PUNPCKxx
These are most certainly generated by GCC from C++ code (no intrinsics and such) - can even provide an evidence in the form of a codebase with appropriate build flags.
Their use is rather straightforward I think, but still.
> 5) Complex SIMD instructions like PSHUFB or PUNPCKxx which modern compilers can't reason deeply enough about your code to use effectively. (I'd love to be proven wrong on this one!)
I ran his example on my own machine and found this:
>"1) Privileged instructions, like IN/OUT, WAIT, or SGDT. These instructions aren't even usable outside the kernel, and there's no way to represent their effects in pure C anyway."
Could you elaborate on this? I understand you need to be in ring 0 for these instructions but you still need be able to compile the kernel. Or I am completely misunderstanding something obvious or missing your point completely?
There isn't any pure C code you can write which would be expressed with those instructions, in the sort of way that MUL would be an expression of "a * b" (for example). The C virtual machine doesn't have the concept of an I/O port or a global descriptor table; the only way of interacting with these features from C is to either use a CPU-specific intrinsic (not available for these instructions) or inline assembly.
Kernels can't really be built with standard-compliant C, and use noticeable amount of assembly and intrinsics (which are essentially inlined asm functions or generators for those) to drive such details.
SGDT and friends are not privileged. You could execute them perfectly fine in user mode. If you did, you get a number which is completely useless in userspace, but as they don't trap, virtualization software has to validate or rewrite all userspace code to filter them out.
One of the things done by modern CPUs to aid VMs is making them trappable by setting the CR4 UMIP bit
That those instructions had not been privileged has been widely considered as a serious design error of Intel 80286.
The virtualized modes introduced at about the same time by Intel and AMD, around 2005 / 2006, have been necessary mainly for fixing the broken privileged mode of x86/x86-64.
There is the well-known paper from IBM, published in 1970, "A Virtual Machine Time-Sharing System", which has listed the requirements for a CPU that can be virtualized, and which have been violated in the design of the Protected Mode of 80286 (in 1982), whose defects have been inherited by the later Intel and AMD CPUs.
In a CPU with a well-designed privileged mode there is no need for any other extra "hypervisor" mode, because an operating system cannot detect whether it is executed in the privileged mode on the real machine or in the non-privileged mode in a virtual machine, and the OS can be protected from the user processes by the same mechanisms that protect the user processes between themselves.
The UMIP feature available in all recent Intel and AMD CPUs is another fix for the original mistake.
The answer is obviously "no" because there's no such thing as "the set of all x86 instructions." Every vendor has many vendor-specific x86 extensions inside their microcode, some of which end up being adopted by all other vendors (essentially becoming "canonized" as x86 proper). In some cases, these extensions are widely accepted (like x86-64, which was initially developed by AMD, hence its amd64 moniker). In other cases, these extensions die a quiet death, like 3DNow! (also one of AMD's extensions, introduced in their K6-2 line). And in other cases, extensions are only implemented for specific lines of processors (Xeons famously include AVX-512[1]).
To look at a more concrete exemplification of this, gcc has both AVX-512 and 3DNow! flags[2] (because it was around back in 1998), while go only has AVX-512 flags (and doesn't care about deprecated extensions like 3DNow!).
> Nobody really seems to know how many x86 instructions there are, but someone counted 678, meaning there are over 200 instructions that do not occur even once in all the code in my /usr/bin.
That linked stackoverflow was a bit silly and misleading. They admitted they were too lazy to count the instructions given in Intel's manual.
Of course we know how many instructions there are. Assemblers (e.g. NASM) have to know what instructions exist if Intel or AMD want people to use them. There are undocumented and dead vendor instructions that nobody cares about (3DNow, Cyrix maybe?) But we definitely know all officially available instructions because they are all in Intel's manual.
The main problem is what is a definition of an "instruction"?
For example, let see the MOV instruction. How many instruction do you count it as?
Intel manual listed 2 variants of MOV using the same mnemonic. One for general register, and another for control register.
But if are also consider GNU AT&T syntax, then we have movb, movw, movl, movq, and movs, depend on operarand size. Do you count as 5 or 1? To make the matter more complex, movd and movq in GNU can also be used for MMX/SSE register, while Intel listed MOVD and MOVQ as seperate instructions for MMX/SSE (but they also list MOVD and MOVQ in the same section). How many do we count?
If you think all of these are silly and a move is a move, do you also count MOVDQA and MOVDQU as the same (MOVDQA works only with aligned 128-bit data, while MOVDQU also works with unaligned data). How about the VEX prefixed version, VMOVDQA and VMOVDQU, that works exactly the same but has no penalty when use in stream of VEX-prefixed instructions?
If we count by actual machine code, then there are like 10+ variations of MOV instruction depend on the operands.
(And I still probably have forgotten about a few more MOVxxx instructions)
The linked stackoverflow is a little bit silly, yes, but we actually cannot count how many instructions there are in x86/amd64.
Why, though? The official x86 syntax is Intel syntax. In any case, it doesn't matter. Every possible encoding of each mnemonic is given in Intel's manual. This whole business about "what is an instruction anyway" is just sophistry. Anyone using gas would hopefully know how the AT&T syntax is mapping to Intel encodings.
> but we actually cannot count how many instructions there are in x86/amd64.
Again, you can. It's quite simple. Define an instruction how ever you want, and then go and look at the manual. It's a finite known set. Intel isn't hiding this info.
> but we actually cannot count how many instructions there are in x86/amd64
Sorry for being pedantic, but we can, if everyone agrees on what is actually being counted, aka what the definition of an “instruction” is.
You already made the case supporting that POV yourself, then oddly flipped back to implying there was something preventing it besides that at the end. We definitely “can” agree on a definition of an “instruction”, but yes, we probably won’t.
You are right that when using a clear definition of what distinct instructions are it is always possible to count how many instructions an ISA contains.
The only problem is that many people are careless when counting so they do not apply consistent rules, thus obtaining different results.
The most widely used traditional rule is that 2 instructions are not distinct when they perform the same operation on operands of a specified data type, so that the only difference between them is where the operands are.
The operands may be in various kinds of registers, in the instruction stream (immediate operands) or in data memory, and their addresses may be computed in various ways, but the operation performed by the instruction is the same.
The distinct kinds of instructions obtained by this definition can be further grouped in a smaller number of generic instruction types, which perform the same kind of operation, for example a multiplication or a comparison, but on different data types, e.g. on 8-bit/16-bit/32-bit/64-bit integers, signed or unsigned or signed with saturation or unsigned with saturation, fixed-point numbers or floating-point numbers, polynomials with binary coefficients, bit strings or character strings, short vectors or matrices, and so on.