There's not much point to this algorithm now. It is faster to produce the correc...

derf_ · on Dec 24, 2019

This is a nice result for the single-scalar case, but it is not immediately obvious to me how it would scale to support SIMD, given its use of per-element indexing and a periodic conditional fallback to a slowpath which does not trigger at the same time for all lanes. Extending Kahan summation (or similar techniques) to take advantage of SIMD, however, is completely straightforward. In the age of AVX-512, the speedups from SIMD are hard to ignore even for 64-bit precision arithmetic.

sharpneli · on Dec 24, 2019

There is plenty of point, especially in GPGPU world. Register pressure by approach you showed would absolutely trash the performance.

There is no silver bullet for this. You must balance performance and accuracy.

lidHanteyk · on Dec 24, 2019

This is quite interesting, and I'll be examining your method more closely in the future. A skim of the paper convinces me that there is merit.

The main upside to the Kahan method is that it can be incremental and online. Imagine that one is writing a Prometheus-like metrics client, and keeping a running tally. One cannot reorder the summation, and one cannot take advantage of parallelism. In these cases, a small Kahan accumulator can perform incredibly well.

pixelpoet · on Dec 24, 2019

> The main upside to the Kahan method is that it can be incremental and online.

I got excited when I heard the claims of a method better than Kahan/Neumaier summation, but the storage requirements are enormous (6700% versus for the "small" version, versus 200% for Kahan), so I definitely wouldn't call Kahan summation "pointless". It's in fact one of the most amazing and underutilised bits of CS out there IMO.

nayuki · on Dec 24, 2019

I'm surprised to see this author randomly appear! Radford Neal was my professor at the University of Toronto for a course on information theory, CSC310H.

tanderson92 · on Dec 24, 2019

To be clear, you are surprised to see results on algorithms by a professor of computer science?

ben509 · on Dec 24, 2019

I like that there's an accumulator struct. fsum is nice, but it's ugly to stash everything into a list; it's often prefereable to push terms into an accumulator and get the sum at the end.