Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's not much point to this algorithm now. It is faster to produce the correctly-rounded result of the exact sum (at least if you are summing more than a few terms), rather than use Kahan summation to produce a still-not-entirely-accurate result. The wikipedia page mentions the python fsum function that does this, but my method at https://gitlab.com/radfordneal/xsum is faster.


This is a nice result for the single-scalar case, but it is not immediately obvious to me how it would scale to support SIMD, given its use of per-element indexing and a periodic conditional fallback to a slowpath which does not trigger at the same time for all lanes. Extending Kahan summation (or similar techniques) to take advantage of SIMD, however, is completely straightforward. In the age of AVX-512, the speedups from SIMD are hard to ignore even for 64-bit precision arithmetic.


There is plenty of point, especially in GPGPU world. Register pressure by approach you showed would absolutely trash the performance.

There is no silver bullet for this. You must balance performance and accuracy.


This is quite interesting, and I'll be examining your method more closely in the future. A skim of the paper convinces me that there is merit.

The main upside to the Kahan method is that it can be incremental and online. Imagine that one is writing a Prometheus-like metrics client, and keeping a running tally. One cannot reorder the summation, and one cannot take advantage of parallelism. In these cases, a small Kahan accumulator can perform incredibly well.


> The main upside to the Kahan method is that it can be incremental and online.

I got excited when I heard the claims of a method better than Kahan/Neumaier summation, but the storage requirements are enormous (6700% versus for the "small" version, versus 200% for Kahan), so I definitely wouldn't call Kahan summation "pointless". It's in fact one of the most amazing and underutilised bits of CS out there IMO.


I'm surprised to see this author randomly appear! Radford Neal was my professor at the University of Toronto for a course on information theory, CSC310H.


To be clear, you are surprised to see results on algorithms by a professor of computer science?


I like that there's an accumulator struct. fsum is nice, but it's ugly to stash everything into a list; it's often prefereable to push terms into an accumulator and get the sum at the end.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: