Hacker News new | past | comments | ask | show | jobs | submit login

Issue #1 is a big deal, but they've never said otherwise.

Issue #2 is less of a reason to worry. The 33 instructions is an upper limit, and will likely be a lot lower on most implementations. They also don't expect, nor need, it to be saturated.

Further, their instruction set is much more amiable to parallelism than normal instruction sets because

* Their instructions have special values (NaN and NaR) that enable extra parallelism.

* Their instructions avoid side effects that prevent parallelism. This includes not just condition codes, but effectively the whole belt mechanism.

* Their instructions can move data across themselves horizontally (under certain phase boundaries), so certain data dependencies are allowed inside an instruction. In essence, you're running at three-to-five times the clock rate but each clock is only allowed to run a given subset of instructions.

There's an example they give where the whole of a vectorized strcpy iteration is done in one instruction, with no set-up or tear-down needed. This is of course a best-case scenario, in that every technique they have is in use simultaneously, but all of the techniques they give are generally applicable and (relatively speaking) simple for compilers to use.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: