Ruby's Proposed STM

AaronFriel · on Oct 30, 2020

Now what happens when you add STM to these languages? You've just created an enormous footgun by creating another color of function (https://journal.stuffwithstuff.com/2015/02/01/what-color-is-..., see also: https://news.ycombinator.com/item?id=8984648).

What this proposal misses is that reasoning about software transactional memory in functions all but demands being able to abort and retry code to make progress, and that therefore means having a high level separation between transactional code and IO code.

Suppose you had a banking application, you would like to use STM (as in the "ractor" example in this article) to move balances from one to another. In Haskell, you'd perform STM actions on those TVars (transactional vars) and use the default retry policy of STM to automatically construct an order of operations such that each operation appears to occur atomically. (STM doesn't impose a clock based ordering but I don't think it would be tremendously difficult to do so.)

If you don't use Haskell's "check" or "retry", you find that your code will abort and fail with near certainty. You can't perform lots of transactions on tvars and not, occasionally, read from a var another transaction is writing.

So you add "check" calls to verify your preconditions and postconditions such as balances cannot go negative, and you add "retry" calls to automatically retry transactions that fail because of concurrency.

Now imagine doing that in a modern Ruby codebase, or a modern JavaScript codebase. There's so much mutation that happens in these environments, can you guarantee within a Ruby function that it hasn't mutated another variable with ease? That it hasn't written to disk, made an API call, or changed some global state?

Haskell pioneered the use of STM. It's easier to reason about in Haskell because the world of the STM can be isolated from the world of effects due to the IO monad. It's easy to declare "this function is blue" and "this function is red" and the never the twain shall meet.

In any other language, all bets are off.

thaumasiotes · on Oct 30, 2020

> That’s why C# has that little caveat. You can avoid the pain of async in C# by using threads.

This is the first time I've seen threads referred to as a way to avoid pain.

ancarda · on Oct 30, 2020

I honestly prefer threads and mutex locks to async/async.

Feel free to burn me at the stake.

tzs · on Oct 30, 2020

Same here. I'm curious how you got there.

I think a large part of that for me is because early in my career I got involved in Unix kernel work, first writing device drivers and later doing Unix ports, including both porting a swapping Unix to paged hardware (and rewriting pretty much all process and memory management from scratch) and porting a paging Unix to swapping hardware.

That was followed by a few years largely doing embedded stuff on systems too small to have operating systems or memory management units. For those I'd write a small kernel that supported threads and scheduling (sometimes cooperative and sometimes preemptive) and then write the embedded applications much like I would have written them on Unix, using threads instead of processes.

So for me threads are easy and things like async/wait seem overly complicated and less clear because I came at concurrency from the operating system side of things and the hardware side of things (e.g., dealing with interrupts) where thread-like thinking is natural and perhaps required.

Maybe it is conceptually harder for people who come at it from the application side of things?

mumblemumble · on Oct 30, 2020

I think it's more that threads are a poorer fit for the application side of things.

On the app side, "I want to do this thing that might take a while without blocking the UI, and then do something with the result, and I have zero need to communicate with the code that's running the task beyond giving it its arguments and getting its result back," is the most common use case. Task parallelism is tailor-made for that kind of problem.

Using a dedicated thread, on the other hand, gives you some raw material you can use to solve that problem, but leaves you to handle all the messy details on your own. And doing it the easy way - one short-lived thread for every thing you need to do in the background - is expensive, while doing it the correct way and implementing a thread pool is essentially just committing to Greenspunning your own personal task parallelism library.

formerly_proven · on Oct 30, 2020

Whether async or threads are preferable depends a lot on context and often the answer will be "both".

E.g. GUI code -> async, because you must do everything on the UI thread. Heavy lifting in a GUI app -> threads. Traditionally you'd do this with a huge mess of callbacks, but these days it's reasonably easy to just write the GUI portion using async and have background computations run in threads signal their results through futures into the async code, which makes everything so much easier.

banana_maker · on Oct 30, 2020

From what I can tell, both solve different problems.

There are times when one might want async executors like node's event loop or rust's tokio/async-std, but in other situations one might want os threads or "green threads" with a RwLock or Mutex.

What about a few long running threads that use async internally? ;-)

mumblemumble · on Oct 30, 2020

That's another thing that C# (or, rather, .NET) handles really well. I can't remember the exact syntax, but the Task API has a way of declaring a background task as "long-running". The key difference is, long-running tasks get a dedicated thread; short-running tasks execute on the thread pool.

dullgiulio · on Oct 30, 2020

That's the Go runtime.

FpUser · on Oct 30, 2020

Count me in your camp (with the couple of exceptions)

tasogare · on Oct 30, 2020

They aren’t even solving the same problem. Threads exist to use multiple CPUs while async/await are here not to waste time waiting on I/O.

coder543 · on Oct 30, 2020

Threads existed long before multi-CPU computers or multi-core CPUs were common, so no... they don’t exist for that reason.

Parallelism and concurrency are orthogonal concerns. Multiple threads can run on the same core, while concurrent awaits can run on separate cores (depending on the language). Or the opposite!

Threads definitely solve wasted time waiting on I/O, but OS-level threads are a very limited resource in many environments, which can cause problems more quickly in programs with unbounded concurrency, and they tend to have a lot of startup cost, so they have to be created judiciously to avoid hurting performance.

Languages like Go and Elixir give the programmer access to unlimited lightweight “threads”, and that works great for concurrency, without requiring developers to re-color their functions.

One additional note is that some OSes (including Linux) have struggled to support true nonblocking File I/O for many years, so you literally have had to use multiple OS threads for I/O concurrency... one for each file operation in progress. Otherwise you would block the async executor! Languages often handle this implementation detail for the programmer, so you just don’t realize you’re using a thread-per-await sometimes.

(io_uring seems to finally solve this in a meaningful way on Linux)

taffer · on Oct 30, 2020

Threads are fine as long as they do not share mutable data structures. For example, multithreaded application servers are a rather painless experience.

eyelidlessness · on Oct 30, 2020

Since all of these concurrency approaches’ pain points are around shared mutable state, maybe it’s shared mutable state that’s the pain point.

bglusman · on Oct 30, 2020

Yup, this is the conclusion of Joe Armstrong and why Erlang/Elixir/BEAM just doesn't allow mutable state, other than process dictionary which is inherently non-shared.

mumblemumble · on Oct 30, 2020

That observation doesn't really contribute to a comparison of dedicated threads and async/await, though, because the same is true of async routines.

mumblemumble · on Oct 30, 2020

For a long time I experienced excessive pain with both threads and task parallelism. It took a good year or two to realize the problem was that I didn't have a good instinct for figuring out which technology is more appropriate for a given situation.

rurban · on Oct 30, 2020

Not only bets are off in non-functional language, it's the wrong solution.

STM puts the cost of locking into everything, ie slower default operations, whilst still being concurrency unsafe. Deadlocks can occur.

There are proven good solutions for fast and safe concurrency with ownership tracking and a good compiler/scheduler. No, not Rust. And not STM. We fought with this for years.

rrss · on Oct 30, 2020

> There are proven good solutions for fast and safe concurrency with ownership tracking and a good compiler/scheduler.

Do you have any pointers to these solutions for the uninitiated?

rurban · on Oct 30, 2020

All concurrency-safe languages. Eg pony, concurrent pascal, the parrot vm, ...

amelius · on Oct 30, 2020

Eh, STM is often used as a solution to avoid locking.

https://en.wikipedia.org/wiki/Software_transactional_memory

brandmeyer · on Oct 30, 2020

If you read into the implementation papers a little more deeply, you'll find that they are typically implemented with locks under the covers.

chrisseaton · on Oct 30, 2020

It’d be good if all writes in a transaction were automatically transactional and we had no TVars, and also if all IO automatically made a transaction irrevocable.

But I’m not sure this is tractable in either implementation effort or runtime overhead.

marcosdumay · on Oct 30, 2020

Are you talking about the Haskell one?

If so, you can't do IO inside a transaction (it won't compile), and any value not on a TVar is thread-local.

chrisseaton · on Oct 30, 2020

> Are you talking about the Haskell one?

No?

I'm responding to the idea of the 'colour' of functions and suggesting how I think it should be in Ruby - all side effects transactional, but IO irrevocable. Drop the idea of TVar and make all mutable locations transactional.

a1369209993 · on Oct 30, 2020

Honestly the main feature of Haskell in this regard isn't STM-colored functions (as you note, that's a bad thing), but rather that it almost supports color-generic functions in the form of `Monad f => A -> f B -> f C`. Unfortunately, it doesn't correctly handle the identity monad `type Id a = a`, instead requiring `newtype Id a = Id a`, so it's impossible to write a function that matches both (eg) `A -> B` and `Monad f => A -> f B`.

AaronFriel · on Oct 30, 2020

I think Haskell enforcing a separation between the "color" (or monad) of functions is generally good.

kccqzy · on Oct 30, 2020

I don't understand your argument.

It is correct that the standard library provides you with a newtype-based identity monad. But as you have said, you can also write own `type Id a = a`. What's wrong with that?

a1369209993 · on Oct 30, 2020

> What's wrong with [`type Id = a`]?

The fact that it doesn't work:

  > - The type synonym 'Id' should have 1 argument, but has been given none
  > - In the instance declaration for 'Monad Id'
  > | instance Monad Id where

kccqzy · on Oct 31, 2020

Of course that won't work because that type synonym is far from a monad. I think you meant to write `type Id a = a` or `type Id f a = f a` instead.

But I get your argument now. What you are saying is that we want some kind of a way to define functions that are either pure or return a monadic value. Currently that's not doable because these two have distinct types. It may be difficult to unify them via the type system. This is even more difficult for higher-order functions that take other functions.

a1369209993 · on Oct 31, 2020

> I think you meant to write `type Id a = a`

Indeed I did, sorry, unfortunately too late to edit. The following reproduces the error:

  {-# LANGUAGE TypeSynonymInstances #-}
  type Id a = a
  instance Monad Id where
    x >>= f = f x
    return = id

(The last two lines can be ommitted, since GHC aborts before checking the instance body. Edit: which is why I didn't notice I had the arguments to >>= backward; fixed?)

> It may be difficult to unify them via the type system. This is even more difficult for higher-order functions that take other functions.

IMO, the main difficulty in unifying `a ~ Monad m => m a` is inferring `m ~ \a1-> a1` in a way that allows `\a1-> a1` to inherit `Id`'s instances. It's a understandable deficiency, but it is a deficiency.

jpcooper · on Oct 30, 2020

There’s no guarantee that you’re not calling unsafePerformIO in Haskell, which a lot of outwardly pure libraries do. In fact, wouldn’t STM be doing this under the hood? Even the FFI in Haskell allows you to import a function without the return type being in IO as long as you pinky promise that it is pure.

The idea in the end is to provide a language for working on memory concurrently. You either choose speak it or not.

kpmah · on Oct 30, 2020

As with Rust's 'unsafe', the fact that you can explicitly circumvent checks doesn't make those checks useless. If something goes wrong you know exactly where to look.

jpcooper · on Oct 30, 2020

That may be the case for Rust. I am not sure what the implications of unsafe are for the type system.

In any case, the unsafeness of unsafePerformIO and importing an FFI function as a pure function are not reflected in the type in Haskell. These are things that can and do happen.

My point was that the ability to go outside the bounds of a language does not necessarily make the language useless.

a1369209993 · on Oct 30, 2020

> I am not sure what the implications of unsafe are for the type system.

There are no implications, that's the point.

> the unsafeness of unsafePerformIO and importing an FFI function as a pure function are not reflected in the type in Haskell

Yes, that is correct behaviour; if it was reflected in the type it'd be call safePerformIO.

nextaccountic · on Oct 30, 2020

> In any case, the unsafeness of unsafePerformIO and importing an FFI function as a pure function are not reflected in the type in Haskell.

It's reflected in safe Haskell annotations. Unfortunately nobody cares about those. Haskell has a culture problem regarding safety, just like most other languages, both high level and low level (with the exception of Java, that don't make data races UB)

funcDropShadow · on Nov 1, 2020

Another reason why it is easier to integrate and use STM in Haskell than in most other languages, is the prevalence of immutable types. When only a tiny fraction of all memory structures needs to be tracked by STM as TVars, it becomes orders of magnitudes cheaper than in languages with predominantly mutable data structures.

cat199 · on Oct 30, 2020

> In any other language, all bets are off.

No, in any other language, one has to have an understanding of the codebase and what is mutating and what is not instead of relying on the language to do this work for them, which, granted, can be a lot more work.

dismissing things because 'i dont want to deal with it because it is hard' is not the same as dismissing it because 'this is not possible', this argument is the former.

see also: ocaml / scheme style FP, which can be 99.9% as pure-functional and type-checked as haskell if one wants to build their system that way.

marcosdumay · on Oct 30, 2020

Well, if you don't like the wording, translate "all bets are off" into "the programmer(s) must never make a mistake on those 10k to 10M lines on the code".

That's not a reasonable thing to expect.

orthoxerox · on Oct 30, 2020

Haskell should just have transactional IO, then. RDBMSs have had it since god knows how long ago.

AaronFriel · on Oct 30, 2020

Haskell is a general purpose programming language, so I'm not sure what you mean. You can't in the general case have "transactional IO".

kimi · on Oct 30, 2020

Clojure had STM from the start, and it was one of its selling points, but the real world experience with it is that nobody uses it - atoms (i.e. global serialized write access, with non-locked reading) are 99.9% good enough.

jwr · on Oct 30, 2020

That is true if you take a narrow view/definition of "STM". It's true that atoms are enough in most cases, but the reason they work so well is because the rest of the language is designed with STM in mind. You need immutable data structures throughout the language in order to be able to use atoms for synchronization: when you access a data structure through an atom, you want to be sure you are getting your own "snapshot" that no one else will modify.

This is why I frown when people say "Clojure STM is useless" or "nobody uses Clojure STM". Everybody uses the foundations, and the reason why the foundations are so useful is precisely because they were designed for STM.

I've been writing Clojure apps for many years now, and it is true that atoms are usually enough. I used agents in the past for their serialization capability, but these days core.async is usually a better tool. I never found a use case for refs that could not be simplified. In most applications I wrote, transactionality was needed together with persistency, which shifted the responsibility to a database.

nlitened · on Oct 30, 2020

I remember in one of his interviews, Rich Hickey mentioned that STM isn’t in the language to be used all over the place, it’s there to make some very rarely used things possible, which otherwise would not be possible at all.

Which is a very reasonable way to see it, in my opinion.

mullr · on Oct 30, 2020

It certainly seems to be rarely used. It's interesting to look at the concurrency stuff in Clojure and its uptake:

- Atoms: very popular. Usually what people reach for, perhaps overly so.

- STM / refs: Not commonly used, but I wonder if this is because we aren't used to really considering them as an option?

- Agents: I think everybody agrees that you shouldn't use these

- core.async: What many people think you should use instead

- Reducers: I've never seen these used in the wild.

wtetzner · on Oct 30, 2020

> Atoms: very popular. Usually what people reach for, perhaps overly so.

Having written Clojure professionally, and then moved on to Java (not intentionally, just kinda happened), I find myself using the same pattern in Java. AtomicReference + pcollections, along with Lombok's @Value is enough to get some pretty Clojurey concurrency patterns in Java.

minikomi · on Oct 30, 2020

There's also pmap which is amazing, manifold & it's little deferred/event-ish universe..

Both have their place!

blandflakes · on Oct 30, 2020

I think this mirrors how concurrency really is in many applications:

* a lot of services simply defer their state management to some other service - a database or document store. Typically have to use their concurrency/transaction management

* In apps that have state, a lot of times the only safety is required - this state only changes in one place, I can't afford a partial read, but otherwise I don't really care about concurrency. Atoms for this.

* Sometimes there is an ordering or coordination to state, but a lot of time that is producer-consumer, so core.async can be a nice fit.

* In a (relatively) small subset of cases, I actually have multiple writers whose behavior I need to coordinate. STM is perfect, and I prefer it to managing my own locks in those cases.

reitzensteinm · on Oct 30, 2020

I have used refs for a multiplayer server. Five years in to Clojure, there was finally a reason to break them out.

Players were in rooms, which were updated ten times a second. All of the rooms are largely independent, meaning the best strategy to do this is to use a threadpool to update them in parallel.

However, occasionally operations would require synchronization between rooms (eg, move one player from one to another).

It worked pretty well.

_pmf_ · on Oct 30, 2020

It was supposed to be THE killer feature of Clojure. Rich Hickey's "ants" demo is still firmly in my mind.

augustl · on Oct 30, 2020

Atoms - and immutable values :) Those two are really killer and solves all of my state problems, at least.

MrBuddyCasino · on Oct 30, 2020

Also it isn’t particularly fast afaik. Hardware support was supposed to fix this, but the Intel primitives were buggy and not available on all models.

reitzensteinm · on Oct 30, 2020

Hardware transactional memory wouldn't accelerate Clojure. The actual transactional bit is very cheap, built around compare and swap.

You first grab the current value of two refs, build two new values out of them through some process.

Then on commit, you lock both refs and check they're unchanged since the start of the transaction. If they are, you swap the values, if they're not, you grab the new values and rerun the computation (on and on until timeout).

The overhead comes from the lack of granularity - if two transactions change different keys in a map they'll conflict even though in theory they could run simultaneously (and indeed can with a hash table and HTM like TSX).

And also because of the speed of persistent data structures. They're not exactly slow given the flexibility they give you, but each write involves allocation.

MrBuddyCasino · on Oct 30, 2020

> The overhead comes from the lack of granularity - if two transactions change different keys in a map they'll conflict even though in theory they could run simultaneously (and indeed can with a hash table and HTM like TSX).

Can you expand on that? Couldn't Clojure introduce a STM-aware HashMap, why would TSX help?

reitzensteinm · on Oct 30, 2020

It could, but it just wouldn't be Clojure any more. The whole thing is built on persistent data structures, where if you modify an object you get the new version, and keep the old version unless it's garbage collected away. Further, there are guarantees that both versions perform well.

This isn't just used for Clojure's STM (although it makes the STM model trivial). It's essentially the fundamental premise of the language. Functional programming with structural sharing to allow immutable data structures to be fast enough to be useful in the real world.

Introducing a new data structure that was (I assume you mean) HTM aware would not offer the same guarantees. You'd be able to have two threads mutating it, but you couldn't (for instance) keep a history log of values by hanging on to a pointer to each one, that is still reasonably space efficient.

TSX or other HTM would allow you to build a hash map that two threads can modify simultaneously by literally just reading and writing to it as normal (with some nasty edge cases, but fundamentally it's similar to a single thread version). The hardware keeps track of which cache lines are involved in the transaction, ensures they're not written to memory, and if there's a conflict it throws them away to retry the transactions.

However in Clojure, you're not modifying the map, you're creating a new version that shares most of the structure with the old version except for your differences. If two threads modify the original hash map A, to yield B and C, all you've got now are three different trees. You might be able to invent a data structure that allow you to merge the diffs of A->B and A->C to yield D, containing both versions. But it's probably not going to be awfully fast and it's going to have corner cases that violate the transactional guarantees.

reitzensteinm · on Oct 30, 2020

I should point out that I'm fairly familiar with the literature roughly up to the level that Clojure takes advantage of (I've used it to implement custom versions of persistent maps and RRB trees that store to disk). There may be more advances since then or techniques I'm not familiar with.

I think the problem I'm describing is fundamental, but I'm not a researcher and am not claiming any advanced knowledge beyond being an experienced Clojure developer that is familiar with the internals.

jwr · on Oct 30, 2020

> Also it isn’t particularly fast afaik

What exactly "isn't particularly fast"? :)

In any real-world application the cost of STM machinery itself is, at a first approximation, 0.

There is a cost to immutable data structures, but I don't think that's what you were talking about.

mrh0057 · on Oct 30, 2020

STM under a concurrent load can be expensive due to how modern CPUs work. Since you are updating values on different CPUs going to have cache invalidation. Then you have the issue if you are doing an expensive calculation in the transaction can cause the same transaction to fail repeatedly leading to starvation.

mcintyre1994 · on Oct 30, 2020

Thanks for sharing this here, it was a great read and really well written. I really like the style of communication of taking a problem and existing solution, and adding requirements and workarounds one at a time until that stops working to motivate a new approach. And this was a really nice introduction for me to STM too, which seems like an interesting topic to look into more. Appreciate the references too!

fnord123 · on Oct 30, 2020

>Let’s say we’re a bank managing many bank accounts. Each account has a total. We get a never-ending stream of requests to move a sum of money m from an account a to account b.

We all know that you should not do accounting using floating point numbers. But also, please understand that bank accounts are [append only] ledgers and the balance is a cached computation. I truly wish that we had a different example use case because this is really bad.

tomstuart · on Oct 30, 2020

This is the opposite of my point of view. I think it’s a great example because the whole point is that, while most people’s everyday experience of a bank account is as a single mutable value, the reality is far more complicated and subtle. Teaching is all about using simplified examples to convey intuition, and “you might think it’s easy to keep a single number up to date, but moving money between two accounts is fraught with danger” is a perfect demonstration.

jeltz · on Oct 30, 2020

Maybe for bank accounts that is true, but online casinos work pretty close to that example (except the floating point part) since we do not ever want to allow an account balance to be negative. Allowing balances to be negative would be a regulatory violation so what most online casinos do is to first lock and update the balance (or balances since there is also bonus money) while making sure it does not go into the negatives and then log transaction amount plus after balances.

So for online casinos I would not say that the balance is cached. And I imagine traditional banking also has accounts which are never allowed to go into the negatives where a similar solution is required.

the-dude · on Oct 30, 2020

Regulators would like a word with you. Also, in theory it sounds nice, but reality is always more nuanced.

I have implemented a bank-like SaaS which was regulated by our national financial regulators.

fnord123 · on Oct 30, 2020

Of course it's more nuanced than a short HN comment. It's also more complicated than using a mutex to make changes to two values.

> I have implemented a bank-like SaaS

Which one? What made it bank like? How was it implemented?

> which was regulated by our national financial regulators.

Which country?

> Regulators would like a word with you.

To say what?

the-dude · on Oct 30, 2020

> To say what?

Among other things, they require you to store the before & after balance with each transaction. As a truth value, not a cache.

And now your whole premise is out of the window ( there is a bunch of practical reasons it won't work in reality too ).

kevingadd · on Oct 30, 2020

A string of before and after balances is still a ledger. This is distinct from just 'account, current_balance' tuples like a more typical schema would have.

the-dude · on Oct 30, 2020

Isn't one of the charactarising features of a ledger that it is a string of relative values instead of absolute values?

a1369209993 · on Oct 30, 2020

No, it is not. A ledger is a append-only data structure consisting of a list of (applied) transactions. What kind of transactions depends on what you're ledger-ing, but it certainly doesn't imply relative values (if anything the opposite, though probably not anything).

the-dude · on Oct 30, 2020

So if they are not related(or relative) to one another, why are they on the same list?

chrisseaton · on Oct 30, 2020

> I truly wish that we had a different example use case because this is really bad.

Well... that’s why the article proposes a different, better example.

fnord123 · on Oct 30, 2020

The second example was interesting and insightful.

pansa2 · on Oct 30, 2020

A few years ago, the PyPy developers were working on STM for Python, as a way to mitigate the Global Interpreter Lock.

Was that work ever completed? How does it compare to this work for Ruby?

mattip · on Oct 30, 2020

Here is the thesis paper that came out of it. https://dl.acm.org/doi/epdf/10.1145/3276479

All the following is my opinion, and may contain errors. Take it with a grain of salt and kindly correct me where I misstated something.

If you view PyPy as a sophisticated test bed for virtual machine research it was a success. There are lots of conclusions about what works and what doesn’t, and benchmarks that show faster code when using many core. But the project ended with the conclusion the the STM model and the tracing JIT model don’t play well together, and the PyPy team preferred to stay with the tracing JIT.

Python and Ruby are very similar. The same JIT that powers PyPy powers topaz, which still performs quite well in ruby benchmarks even though it has been dormant since 2017

danmur · on Oct 30, 2020

My recollection (which could be completely wrong) was that they had a branch with it, but it really needed hardware transactional memory to memory to have decent speed for most applications.

pansa2 · on Oct 30, 2020

Makes sense - they seem to have stopped work on STM so presumably it couldn’t be made to work well.

How will STM in Ruby solve (or avoid) the problems that doomed STM in Python?

the_mitsuhiko · on Oct 30, 2020

It exists, it's just quite slow. I doubt there are users of it.

im3w1l · on Oct 30, 2020

So I have very little knowledge here, but wont the transaction log need a global lock? Is the trick that you hold it for a very short time so that it doesn't become a problem? Also with high contention, wont the transactional threads do a lot of replaying each others work to figure out if they have conflicts or not?

sriram_malhar · on Oct 30, 2020

Yes, to both.

The transaction log is append only, so there are more optimal methods (including lock-free approaches to make it efficient. Also, the duration of the lock is limited to the append operation, and not determined by application (when the atomically block ends).

And yes, STM and high contention don't mix. It works well for occasional writes and predominant reads.

Tim Sweeney's presentation on this topic is hugely informative.

https://www.st.cs.uni-saarland.de//edu/seminare/2005/advance...

mrh0057 · on Oct 30, 2020

STM is implemented using immutable data structures and using atomics you can do a compare and swap to update the value. If the swap fails you try your transaction again. I’m not sure how you would implement this in a language with mutable data structures easily. I know Azul had hardware transactional memory that supported its.

klodolph · on Oct 30, 2020

You use a lock-free structure for the log.

md5person · on Oct 30, 2020

> "lock-free"

I never liked that term.

Lock-free algorithms will usually boil down to some combination of polling, waiting (sleep(), futex-style), or use what are essentially more fine-grained, hardware-backed locks (CAS, memory barriers, LOCK instructions, hardware-specific transactional instructions, etc). The locks are very much still there.

dbaupp · on Oct 30, 2020

It’s used flippantly at times, but lock-free is a technical term referring to a guarantee of global progress https://en.m.wikipedia.org/wiki/Non-blocking_algorithm (I think it’s also sometimes used as a synonym for non-blocking, so including “obstruction-free” too).

For instance, if all threads except one are stopped, the single running thread should “finish” (for whatever finish means) in a finite time (lock-free is stronger than this condition, too). An algorithm using mutex-style locks fails this guarantee, if one of the stopped threads is inside the critical section.

chrisseaton · on Oct 30, 2020

‘Lock-free’ means the application cannot ‘lock up’, rather than there are no locks. CAS does use a kind of hardware-level lock in the cache, but can never cause any application to lock up because there is no per-process state like in a software lock.

md5person · on Oct 30, 2020

Yes, that aligns more closely with my mental model of a "lock-free algorithm", but what I see in practicality is people avoiding (or re-inventing) synchronization primitives, thinking that they don't belong in a "lock-free algorithm".

You can still use standard kernel-provided synchronization objects (mutex/events/semaphores) in your "lock-free applications", as long as you provide timeouts to blocking wait() calls, and handle abandoned objects gracefully.

sriram_malhar · on Oct 30, 2020

True, but the key difference in the lock-free approach is that the "lock" is basically a pointer-swizzling or timestamp-swizzling operation, and so the lock duration is strictly bounded; it is determined by the application or the thread-scheduler.

WJW · on Oct 30, 2020

Naming is hard, but in this case lock-free is just misleading. Why oh why didn't they go with bounded-lock data structures? :)

klodolph · on Oct 30, 2020

Because there is no lock.

With a lock, you acquire the lock, you perform your operation, and then you release the lock.

Lock-free is typically done differently. You don’t acquire a lock, you start performing the operation “optimistically”, and you commit the result if no other thread has stomped on your data. If another thread has stomped on your data, you start over.

One of the important differences here is that it’s a race. Whoever commits first, wins. With a lock, you have to wait for whatever thread acquired the lock first. That thread may not even be running.

So, lock-free is fundamentally different because with a lock-free system, a thread that is running will always complete work (you just don’t know which thread). In a system with locks, threads that are running may be prevented from making progress by threads that are not running.

Systems with locks can also deadlock, lock-free systems cannot deadlock. Usually, there is some sort of guarantee that lock-free systems always make progress.

md5person · on Oct 30, 2020

Your explanation doesn't make sense to me.

> "you commit the result if no other thread has stomped on your data"

What does "committing" mean here? If it means performing an atomic write (a-la CAS), then you're using a lock (see my next point).

> "If another thread has stomped on your data, you start over."

So you let your thread sit there in a spin-loop and CAS a condition-variable. That's a lock for all intents and purposes, and your system can still "dead-lock" (read: your thread will never get to "win the race"), if your "data gets stomped on" over and over again in-between reads (which are obviously non-atomic, otherwise you'd be locking there too).

> "Usually, there is some sort of guarantee that lock-free systems always make progress."

Getting a time-slice to continue running in a tight-loop while trying to CAS is not the same as "making progress".

"Making progress" would be - you'd get a chance to commit your changes and proceed to your next bit of business logic. But with contention, that's not guaranteed to happen at-all?

klodolph · on Oct 30, 2020

I think you’ve just mixed up wait-free with lock-free.

- Lock-free: at least one thread will make progress

- Wait-free: all threads will make progress

The difference between locks and lock-free is very noticeable if a thread holding a lock is suspended (which can happen on a modern system just by accessing memory that is paged out). On real-time systems it’s also possible to guarantee that high-priority tasks will make progress, regardless of how a low-priority task behaves. With locks, a low-priority task can easily prevent a high-priority task from running at all.

chrisseaton · on Oct 31, 2020

A system cannot deadlock with CAS.

It can live-lock.

I think you’re possibly confusing lock-free and wait-free, which is a stronger guarantee.

md5person · on Oct 30, 2020

> "the lock duration is strictly bounded; it is determined by the application or the thread-scheduler"

Practically speaking - wouldn't the exact duration be heavily influenced by the countless layers of abstraction beneath the application itself (kernel, scheduler, hardware, speculative execution, etc)? If so, can we ever truly make the claim that the "duration is strictly bounded"?

chrisseaton · on Oct 30, 2020

In terms of the level at which we reason about thread interleavings, which is what we’re talking about, it is bound.

md5person · on Oct 30, 2020

I wonder why every single comment I leave in this thread is getting down-voted. This community has truly deteriorated over the years.

coder543 · on Oct 30, 2020

Because a lot of your comments in this thread are "missing the point" or (unintentionally) misleading at best, and it's easier for people to use a downvote to make those comments less prominent in the discussion than to go through the trouble of trying to explain why the comments are missing the point or unintentionally misleading.

But, a lot of people have gone through the effort to leave explanations in this case. Isn't that good enough?

For one unaddressed example:

> You can still use standard kernel-provided synchronization objects (mutex/events/semaphores) in your "lock-free applications", as long as you provide timeouts to blocking wait() calls, and handle abandoned objects gracefully.

This might be technically true, but atomics provide significant performance benefits compared to using a mutex with a timeout, and I don't think of "lock free" as relying on busy loops as you seem to think in your comments. I'm sure a busy loop makes sense in very specific algorithms, and mutexes themselves often use a limited amount of busy waiting to avoid context switches. You can simulate atomics with mutexes, but that's not the point. Lock free is hard to do correctly. Mutexes create the possibility of a deadlock, which is incompatible with being lock-free, so you have to avoid all the footguns of lock-free and the footgun of using mutexes.

So, yes, you can use standard sync objects... but that doesn't make anything better in this case. It misses the point. "Technically correct" isn't always good enough to get an upvote.

If you don't agree, that's fine... but this is completely uncalled for and against the community guidelines:

> This community has truly deteriorated over the years.

No one is obligated to upvote things they consider incorrect, and downvotes are perfectly suitable for this purpose under HN guidelines.

md5person · on Oct 30, 2020

> "While technically true... I don't see how it contributes anything useful to the conversation, so I downvoted it."

If you don't see how it contributes to the conversation - why not just ignore it then? Just because you think this doesn't contribute to the conversation, doesn't make it true.

> "If you don't agree, that's fine..."

Clearly it's not "fine", otherwise you wouldn't try to silence comments you disagree with by down-voting them.

This is my last comment on HN.

coder543 · on Oct 30, 2020

> why not just ignore it then?

Because it's misleading and/or missing the point. As stated. Why would I want more people to read something that I believe is misleading?

> Clearly it's not "fine"

Once again missing the point. The prevailing opinion on HN is what will get upvoted, and a lot of other stuff will get downvoted. I have seen that prevailing opinion be hilariously wrong before.

If you want to be heard with an unpopular opinion, you have to make a compelling argument... you can't expect to be upvoted to the top just because you're sharing an opinion. It's rare for a comment to be lukewarm enough for no one to feel strongly enough to downvote it. It's going up, going down, or it's on a thread no one is reading. Downvotes don't mean people hate the person leaving the downvoted comments; it's just a tool for disagreeing with comments.

> This is my last comment on HN.

Because people disagreed with you? People even left explanations, but that wasn't enough.

7373737373 · on Oct 30, 2020

Seeing routing in action like this is very satisfying

cutler · on Oct 30, 2020

Just use Clojure.

waffletower · on Oct 30, 2020

[flagged]

chrisseaton · on Oct 30, 2020

Not sure what you mean? STM wasn't originally developed for Clojure. STM as we know it today pre-dates Clojure by over a decade.

waffletower · on Oct 30, 2020

My troll was intended to say, when a Ruby developer desires to utilize STM, they would instead become a Clojurist and use a language where STM was a foundational design consideration. I am not sure why it is relevant to discuss the originality of STM. But again my post was a troll so say -- whatever -- as I did.