It is correct that on x86 all normal stores have release semantics and all norma...

gpderetta · 2024-12-13T10:07:22 1734084442

Interesting. I don't consider myself an expert, but I have written my share of lock free algorithms. For which algorithms you specifically needed an "initial store" or "final read" for?

I could see these being useful for seq locks, but usually you represent them with relaxed operations and explicit memory barriers.

adrian_b · 2024-12-13T14:07:25 1734098845

An example is when you have a shared data structure that is written by one or more writers and to which one or more readers want read-only access.

In this case, the readers can use optimistic access to the shared data structure, also known as lock-free access (a very inappropriate name in my opinion).

Thus all the readers access the shared resource concurrently, but they must be able to detect when an update is in progress or an update has overlapped in time their read access, in order to stall or retry their reading of the shared data structure.

To implement this, a version counter is used, which is incremented after each update by a writer of the shared data. The version counter must be updated twice, both before and after the rest of the shared data is updated. The first update must provide an invalid version number, to enable the detection of an update being in progress.

There are two simple ways to achieve this. The version counter can be complemented before data update and negated after data update. Complement followed by negate is equal to increment, and negative version numbers would indicate an update in progress. The alternative is to increment the version counter both before and after the data update, when odd numbers would indicate an update in progress and even numbers would be valid version numbers.

A reader must read the version number both before and after reading the shared data. An invalid version number before reading will stall the reading, while after the reading an invalid version number or one that is different from the first version number will cause a retry of the reading.

The two version counter stores by a writer and the two version counter loads by a reader have different ordering requirements.

The version counter store before data update must be an "initial store" (as defined in the previously posted message).

The version counter store after data update must be a release store.

The version counter load before data reading must be an acquire load.

The version counter load after data reading must be a "final load" (as defined in the previously posted message).

There are also other frequently used algorithms, e.g. for queuing locks, which need "initial stores".

The explicit memory barriers are a mistake in ISA design. They have stronger ordering properties than the various kinds of ordered loads and stores that can replace them, and these stronger ordering properties are never needed.

Moreover, explicit memory barriers increase the code size and they occupy instruction decode slots and various locations in the internal structures of a CPU core without providing any benefit.

Explicit memory barriers are absolutely necessary for Arm Aarch64 and for Intel/AMD x86-64 (for the latter only when accessing memory regions with weakly-ordered accesses), but only because these ISAs do not include all the kinds of ordered loads and stores that are needed.

gpderetta · 2024-12-13T14:19:55 1734099595

You are describing a seq lock. Hans Boehm [1] has discussed in details the impedance mismatch between seq lock and the acquire/release model, and the bottom line is that you need compiler help to recognize non-modifying RMW that are purely used to establish ordering and lower them the correct optimal sequence. No compiler does it yet so we all hack it around.

[1] "Can Seqlocks Get Along with Programming Language Memory Models?" I can no longer find an easily accessible free version. The link at HP from Boehms home is broken.

adrian_b · 2024-12-13T15:34:41 1734104081

I am pretty sure that variants of this algorithm have been used many decades before Linux and before the coining of the name "seq lock", which is not a good name for it.

It is not needed only on multiprocessors, but it is needed also even when there is a single writer and a single reader, both running on the same CPU core, e.g. when some data is updated by an interrupt service routine or by a signal handler and the main program must also access that data.

However, before the fashion of implementing weakly-ordered memory accesses, which has spread mostly after 1990, nobody has paid much attention to the ordering requirements for its stores and loads, because in ancient times in all existing computers those ordering requirements were satisfied implicitly by all stores and loads.

Thanks for pointing to the paper written by Hans Boehm, SciHub has it.

gpderetta · 2024-12-13T15:48:23 1734104903

A name is a name, RCU is a similarly terrible name, but it stuck and it is immediately recognizable. the kernel (and related articles on LWN) simply popularized the algo; I have no doubt that it was in use well before linux, but that's how I learned it.

gpderetta · 2024-12-13T12:16:36 1734092196

(oh, I just realized you must be the same Adrian from RWT that was arguing for initial stores/last load)