Without cache coherency and ILP, programming in any language would be insane. It's not like programming in x86_64 assembly suddenly opens a world of possibilities because its "truly low-level". You gain very little extra control over a given platform by switching to assembly over C, that's what we mean by "low-level".
C maps cleanly onto the instruction sets provided by chip manufacturers. It provides the option to the programmer to optimize structures for use in vectorization if they so choose, or to optimize for some other objective like size for a binary wire protocol or limited memory space.
The very nature of having a choice about memory layout of structures and the ability to cleanly link with the platform ABI is what makes C low-level. Obviously the inner-workings of a modern CPU don't map cleanly to the C virtual machine. However, there's no convincing evidence that greater control over cache invalidation is what's holding back performance of those CPUs.
If finer-grained cache control is needed, then it is no big deal to invoke the appropriate compiler intrinsics (if available) or write your own with inline assembly. My preference is to put these sorts of things in a separate .S file.
> It's not like programming in x86_64 assembly suddenly opens a world of possibilities because its "truly low-level"
It certainly opens a world of possibilities regarding the vector units. That's one place where inner loops hand-written in assembly still have an edge. On the other hand, with contemporary CPUs, high quality assembly coding is a highly specialized job skill by itself, so for general purpose engineers, learning it is most likely to yield a bad cost/benefit.
Right. Exact control of data structure layout is also practically essential when implementing a database system. It is needed for correctness as much as anything else.
The instruction sets provided by chip makers (except vector instructions) are tuned to match what compilers for ancient languages want to emit.
The chips move heaven and earth to maintain the fiction that those instructions actually direct what they do.
The number of fundamentally different kinds of cache, and specialized state machines not directly accessible by instructions, in a modern chip would boggle your mind.
In GPU programming (CUDA for example), every time you allocate memory you need to explicitly specify where (general memory, L3, L2, L1, register). Because C doesn't support this, they added language extensions.
So C is too high level for GPU programming, and needed to be extended.
While I also disagree that C is “holding things back”, I don’t think their comment is a “load of crap”. Talk like that is unhelpful and quite frankly non-technical and unprofessional.
Without cache coherency and ILP, programming in any language would be insane. It's not like programming in x86_64 assembly suddenly opens a world of possibilities because its "truly low-level". You gain very little extra control over a given platform by switching to assembly over C, that's what we mean by "low-level".
C maps cleanly onto the instruction sets provided by chip manufacturers. It provides the option to the programmer to optimize structures for use in vectorization if they so choose, or to optimize for some other objective like size for a binary wire protocol or limited memory space.
The very nature of having a choice about memory layout of structures and the ability to cleanly link with the platform ABI is what makes C low-level. Obviously the inner-workings of a modern CPU don't map cleanly to the C virtual machine. However, there's no convincing evidence that greater control over cache invalidation is what's holding back performance of those CPUs.