> Nobody really seems to know how many x86 instructions there are, but someone counted 678, meaning there are over 200 instructions that do not occur even once in all the code in my /usr/bin.
That linked stackoverflow was a bit silly and misleading. They admitted they were too lazy to count the instructions given in Intel's manual.
Of course we know how many instructions there are. Assemblers (e.g. NASM) have to know what instructions exist if Intel or AMD want people to use them. There are undocumented and dead vendor instructions that nobody cares about (3DNow, Cyrix maybe?) But we definitely know all officially available instructions because they are all in Intel's manual.
The main problem is what is a definition of an "instruction"?
For example, let see the MOV instruction. How many instruction do you count it as?
Intel manual listed 2 variants of MOV using the same mnemonic. One for general register, and another for control register.
But if are also consider GNU AT&T syntax, then we have movb, movw, movl, movq, and movs, depend on operarand size. Do you count as 5 or 1? To make the matter more complex, movd and movq in GNU can also be used for MMX/SSE register, while Intel listed MOVD and MOVQ as seperate instructions for MMX/SSE (but they also list MOVD and MOVQ in the same section). How many do we count?
If you think all of these are silly and a move is a move, do you also count MOVDQA and MOVDQU as the same (MOVDQA works only with aligned 128-bit data, while MOVDQU also works with unaligned data). How about the VEX prefixed version, VMOVDQA and VMOVDQU, that works exactly the same but has no penalty when use in stream of VEX-prefixed instructions?
If we count by actual machine code, then there are like 10+ variations of MOV instruction depend on the operands.
(And I still probably have forgotten about a few more MOVxxx instructions)
The linked stackoverflow is a little bit silly, yes, but we actually cannot count how many instructions there are in x86/amd64.
Why, though? The official x86 syntax is Intel syntax. In any case, it doesn't matter. Every possible encoding of each mnemonic is given in Intel's manual. This whole business about "what is an instruction anyway" is just sophistry. Anyone using gas would hopefully know how the AT&T syntax is mapping to Intel encodings.
> but we actually cannot count how many instructions there are in x86/amd64.
Again, you can. It's quite simple. Define an instruction how ever you want, and then go and look at the manual. It's a finite known set. Intel isn't hiding this info.
> but we actually cannot count how many instructions there are in x86/amd64
Sorry for being pedantic, but we can, if everyone agrees on what is actually being counted, aka what the definition of an “instruction” is.
You already made the case supporting that POV yourself, then oddly flipped back to implying there was something preventing it besides that at the end. We definitely “can” agree on a definition of an “instruction”, but yes, we probably won’t.
You are right that when using a clear definition of what distinct instructions are it is always possible to count how many instructions an ISA contains.
The only problem is that many people are careless when counting so they do not apply consistent rules, thus obtaining different results.
The most widely used traditional rule is that 2 instructions are not distinct when they perform the same operation on operands of a specified data type, so that the only difference between them is where the operands are.
The operands may be in various kinds of registers, in the instruction stream (immediate operands) or in data memory, and their addresses may be computed in various ways, but the operation performed by the instruction is the same.
The distinct kinds of instructions obtained by this definition can be further grouped in a smaller number of generic instruction types, which perform the same kind of operation, for example a multiplication or a comparison, but on different data types, e.g. on 8-bit/16-bit/32-bit/64-bit integers, signed or unsigned or signed with saturation or unsigned with saturation, fixed-point numbers or floating-point numbers, polynomials with binary coefficients, bit strings or character strings, short vectors or matrices, and so on.