This is UNIVAC 1108 assembly code. Along with P and V is the code for bounded buffers, with the operations "PUT" and "GET". Bounded buffers are what Go calls "channels". Note how simple they are if you have P and V. That code even works on multiprocessors. There's one semaphore for "queue full" and one for "queue empty". PUT does a P on "queue full", puts on an item, and does a V on "queue empty". GET does a P on "queue empty", takes off an item, and does a V on "queue full". It's very simple. That's the real use case for P and V. Linus' note indicates that in 1999 he didn't know this.
(I didn't write those primitives, but I've used that code, and once ported it to a Pascal compiler I adapted to handle concurrency.)
This stuff was all well understood four decades ago. Much of it was forgotten outside the mainframe world, because threads and multiprocessors didn't make it to microprocessors for several more decades. UNIX, for a long time, had very primitive synchronization primitives. Early UNIX didn't have threads, and even after it got threads, it took years before the locking primitives settled down. The DOS/Windows world didn't get them until Windows NT, circa 1993.
It's been amusing to me to see bounded buffers resurface in Go. They're quite useful, and I've been using them in concurrent programs for many years.
For those wondering, P comes from 'Passering', roughly translated 'pass' (as a noun), and V from 'Vrijgave' ('release'). Apparently somehow this terminology comes from train systems but there's not a lot of context on that etymology.
As an aside, this paper (it's actually the transcription of a lecture) has some great metaphors that explain problems with concurrence and issues with synchronisation. At the risk of losing much of the nuances, he essentially illustrates the synchronisation using the example of a teacher who needs to find a pupil in a class. When pupils are free to choose a seat, she needs to scan all seats when she is looking for a particular pupil; but when at the same time pupils are free to change seats as she's scanning, there is no guarantee that she will ever find the pupil she's looking for as he might run from the back of the class to the front as soon as she's done scanning the front.
That's Dijkstra's later terminology. In EWD35 he introduced the operations as "passering" and "vrijgave". Then in EWD51 he uses "verhogen" and the neologism "prolagen" ("probeer te verlagen"). In EWD74 he switches to "proberen".
I think that much is clear, (the railway analogy), but the idea that tunnels are the main use case for semaphore signals is daft. All railways use variations on the principle of block signalling, where only one train is allowed on any section of track, even if successive trains are travelling in the same direction. This ensures safe separation of trains, which take some distance to stop.
An amusing anecdote I heard about this a few weeks ago: he came up with "P" and "V" when explaining the concept to his students at Eindhoven University. He looked outside the window of the classroom he was teaching in and saw the name of the local soccer club on a nearby stadium which is PSV.
That anecdote is amusing, but it seems unlikely things worked exactly like that. They are near each other in the center of town, but the stadium and the university are a kilometer apart and separated by an elevated railway. At the time EWD35 was written (prior to 1963) the stadium had just installed 40 meter lights, so while it's possible the initials were written on the back of these, it seems more likely that this is a fanciful urban legend. Do you have any more information?
> This stuff was all well understood four decades ago. Much of it was forgotten outside the mainframe world, because threads and multiprocessors didn't make it to microprocessors for several more decades.
Here's an important distinction to make: this stuff was well understood in theory, but the practice is a bit different.
Semaphores are a neat theoretical concept but not a very good practical parallel programming paradigm. Semaphores are easy to reason about when writing proofs by induction that a parallel programming algorithm is working correctly, which is why they are still at the core of parallel programming education.
But when writing practical multithreaded programs, mutexes and condition variables are a lot more practical. Typically each mutex is coupled with one or more condition variables to wait/signal for conditions such as "queue not full" and "queue not empty". Incrementing and decrementing numeric counters is a very clumsy way to maintain a state of any kind. Implementing a semaphore requires grabbing a spinlock and doing this several times is unnecessary when you could just grab one mutex, check for the appropriate condition(s) and then wait/signal on the correct condition.
It is rather unfortunate that parallel programming is still primarily being taught in the theoretical manner (at least I was) using primarily semaphores, leaving too little emphasis on the practical implementation. It is important to understand the theory and know how to do the proofs but it is equally important to apply this into practice.
Every time I see someone "implement" a semaphore using a pthread mutex and condition variable, I cry a little. I've seen this several times in production code.
> It's very simple. That's the real use case for P and V. Linus' note indicates that in 1999 he didn't know this.
I'm pretty sure Linus was joking and he did understand what semaphores are used for. Every computer science curriculum has a course on parallel programming with bounded buffer producer consumer/problems, readers/writers problems and other "toy problems" solved using semaphores, followed by proof by induction that the solution is correct. And Linus did, eventually, finish a degree on computer science.
Semaphores are useful for rate limiting - for example say you have a connections pool with an upper bound, so naturally the number of threads that can acquire a connection and do things with it is limited by the size of that connections pool.
And you don't necessarily need a mutex or to actually put the thread to sleep. Semaphores are also relevant when speaking of asynchronous stuff (e.g. Futures), in which case you can easily do CAS operations on an atomic reference holding an immutable queue of promises. Slightly inefficient under high contention, but non-blocking and gets the job done.
While I do not disagree with you, I still do think that even these cases mutexes and conditions are more practical.
Take the "rate limiting" example you mention (also one of Linus' examples in the OP). You initialize a semaphore to `max_concurrent_connections` and call `semaphore_down()` when you enter the connection handling sequence and `semaphore_up()` when you're done. Now this works fine and is an idiomatic example of using semaphores.
However, in the real world, this kind of situation rarely happens in isolation. What happens if you need to terminate the application for whatever reason, and do so cleanly? If you're under contention, you might have a dozen threads waiting for the semaphore go up and your only option is to kill them. Or implement some logic for this case (using another semaphore) to make sure the application hasn't been terminated while we were waiting for the rate limiting semaphore.
You can implement this cleanly using mutexes and conditions, by creating a "killable semaphore" synchronization primitive. While this is similar to a semaphore as an idea, it's very difficult to implement it if semaphores are the only primitive you have. Additionally, you probably want some kind of timeout if the queue is full.
So in practice you need something like:
while(true) {
socket = accept();
int status = rate_limiting_enter(my_rate_limiter);
// NOTE: someone else may call rate_limiting_terminate()
if(status == OK) {
service(socket);
rate_limiting_leave(my_rate_limiter);
} else if(status == TIMEOUT) {
send_busy(socket);
} else if(status == KILLED) {
send_terminate(socket);
break;
}
close(socket); // this must be called or resources leak
}
Now, while this is theoretically very similar to a semaphore, it has other real world priorities (like timeout and termination) which are very difficult to implement using Dijkstra -style semaphores with only P() and V() operations (and you definitely need more than one semaphore).
This has been the case in almost every practical multithreaded programming scenario I've had. The solution could be thought of using semaphores (and I frequently do) but in practice, there's always some real world conditions (timing, contention, errors) that must be met.
It is very trivial to implement semaphore-like synchronization primitives using mutexes and conditions but not vice versa.
Semaphores are very good for textbook examples and a mental model but not so much in practical software.
Termination in queued systems is moderately hard. You have to drain out the queues. In Go, you can close a channel at the write end and wait for the reader to reach EOF. But if the reader is stuck waiting for something, there's a problem. Especially if it's waiting to write another channel. If you close a channel written by another task, that task will panic when it writes to the closed channel. You can't close channels to force shutdown in Go unless
a panic is acceptable.
This is a classic problem with bounded buffers, re-invented four decades later.
Using two semaphores to implement a bounded buffer has the advantage that writers and readers don't interfere with eachother until the queue is either empty or full. Of course, that might be the only good use of semaphores, and a direct implementation can be better then one using standard semaphores.
Semaphores can be implemented in a way that doesn't require a spinlock in the fast path. Still not as fast as you can make a mutex, but a lot better than the pseudocode given in many classes (which is almost always wrong, too.)
In practice, the semaphores available for many systems are considerably slower than using architecture provided atomic increment operations, which means implementing a bounded buffer with two counters and two mutexes ends up being faster.
Those are the assembler macros ("procs" in UNIVAC terminology) for calling the functions previously linked.
(UNIVAC assemblers were very powerful. Arbitrary computation could be done at assembly time. If you needed some precomputed table, that was the way to do it.)
The previous link just takes me to the FANG homepage frameset, which doesn't have any functions visible on it to me. This is a common problem with linking to framesets. Or is it some kind of problem in my browser, and other people see functions in UNIVAC assembly when they follow that link?
(FWIW, I think it's fairly normal for macro assemblers to be Turing-complete, although some of them carry it off more gracefully than others.)
Well, P and V are considered harmful (pun intended).
Systems using these operations are in general not "composable". In other words, it is usually not possible to compose two software systems using semaphores and/or mutexes, without rewriting these systems somehow.
Alternatives exist. For example: message passing, and STM (software transactional memory). Anybody know of other alternatives?
Message passing doesn't help either. What many programmers don't realize that synchronization problems (deadlocks) are _not_ the consequence of using mutexes or other primitives per se. They're the result of synchronization itself.
Message passing can be asynchronous, but on some level, the system might have to synchronize certain operations. If you implement a financial system, you'll certainly have to synchronize stuff even if you're using asychronous message passing to implement it -- you will simply implement synchronicity on the top of an asychronous infrastructure.
And when operations start to depend on each other, then you _have_ to think about potential deadlocks, race conditions, etc.
Basically there isn't anything that solves this for you. STM is somewhat different as the danger is not deadlocks but starvation, etc.
The way I like to put it is that using a decent concurrency mechanism like message passing or STM or even just what Go does (enforced by community standards rather than actual limitations) takes concurrent programming from an exponential-complexity problem to a polynomial-complexity problem. It's still hard, it'll never not be hard, but it doesn't have to be the insanely, mind-bendingly broken hard that it was in the 90s. The more you can avoid sharing and the more you can operate in your own little world that communicates with other worlds via immutable messages, the happier you will be, and if you occasionally have to dip a bit into true sharing, it's still easier to manage in a saner world than when you try to share everything, all the time.
There's nothing that "solves" the problem, but there are "things that will summon forth C'thulu" and "things that are merely difficult".
> Alternatives exist. For example: message passing, and STM (software transactional memory). Anybody know of other alternatives?
It should be noted that spinlocks, mutexes and conditions (and semaphores) are building blocks that are necessary to implement message passing, software transactional memory and other non-trivial parallel programming constructs. (at least until we have practical hardware transactional memory).
Spinlocks, mutexes and conditions are a "necessary evil", not "considered harmful".
You can implement message passing, STM, BSP, data parallelism, etc., on top of spinlocks, or on top of other implementations of mutexes such as condition variables, or on top of semaphores; but you can also implement them on top of primitives like compare-and-swap, which is arguably even harder to use correctly.
goto is perfectly acceptable in C code. The most common use case is for cleaning up after errors before returning, although they're also used to break out of nested loops cleanly.
goto is harmful when used improperly but fine everywhere else.
STM and message parsing do require the low level primitives: CAS (CompareAndSet/Swap), LL/SC (Load linked/store conditional).
Morealso STM doesn't solve the transactional problem that you face with using locks (mutex). Overall STM is just a fancy thing to have but hardly solves the big problem of transaction boundaries.
Message passing is easier to reason about due to global ordering but then you need some FIFO queues that are built upon some kind of mutex or spin lock... or Lamport based one (using for single producer-> single consumer one)... or some variant of Michael & Scott queue, etc.
In the end underneath you have to do the metal, so they cannot be 'considered harmful'.
No, you can build STM on top of mutexes, too; you don't need CAS and LL/SC as low-level primitives. In fact, IIRC, Intel CPUs use a mutex in their cache coherency protocol to implement CAS and LL/SC.
"Dijkstra was probably a
bit heavy on drugs or something (I think the official explanation is
that P and V are the first letters in some Dutch words, but I personally
find the drug overdose story much more believable)."
Quotes like this are why I always read what Linus has to say, regardless of whether the subject is relevant to my life in the slightest.
Edit: Yes people, those Dutch words exist! I get it! I'm sure Linus was well aware of that when he made this remark. However, the point Linus was making (in a humorous way) was that like most computer scientists / mathematicians, Dijkstra was overly fond of obscure, single-letter names, which nobody ever intuitively understands. Whereas the names "up" and "down" (see the rest of Linus' quote) are far more intuitive. Personally, I would argue that "hold" and "release" convey the intent in a clearer, more abstract way.
To say that a mathematician doesn't intuitively understand single-letter names would be like saying a programmer doesn't intuitively understand a keyboard.
Dijkstra was a computer scientist in times where being a computer scientist meant you were basically a mathematician with a specialty. Semaphores were first introduced by him in one of his early EWDs (EWD35, http://www.cs.utexas.edu/users/EWD/ewd00xx/EWD35.PDF). At the time of EWD35, his EWDs were non-official musings, written in Dutch, intended to be shared amongst interested colleagues and such. It's not like he was writing a scientific paper.
I don't think Torvalds really meant anything much by it. He's just trying to be funny.
> However, the point Linus was making (in a humorous way) was that like most computer scientists / mathematicians, Dijkstra was overly fond of obscure, single-letter names, which nobody ever intuitively understands.
If he was making this point, I find it ironic, considering that the purpose of the post is to explain the distinction between two obscure, poorly-named technical words, which nobody ever intuitively understands.
(ˈsɛməˌfɔː)
n
1. (Telecommunications) an apparatus for conveying information by means of visual signals, as with movable arms or railway signals, flags, etc
2. (Telecommunications) a system of signalling by holding a flag in each hand and moving the arms to designated positions to denote each letter of the alphabet
vb
3. (Telecommunications) to signal (information) by means of semaphore
which definition fits your understanding that you 'wait'? It's not a good name.
Edit: Ok, actually that could be within your first meaning. But the practical result for railway semaphores is that one of their uses is to stop trains from entering a blocked segment.
It's actually quite reasonable to complain about the use of the term semaphore here. As you recognize, stopping the train is only one of their uses. Railway signals can convey many different types of information about the state of the track, not just free/blocked.
And moreover, a semaphore is just a specific type of railway signal, one with a pivoting arm. It's very hard to see how a pivoting arm is at all related to the CS concept.
Not sure if it's reasonable to complain really... I mean, it's not covering everything semaphores do, but the basics are there - gives information and stops two trains/threads colliding. References to trains when talking about cs semaphores are all over the place. Universities:
If we're going to compare it to any real life object, it may as well be a semaphore. Anything else will have the same issues. Traffic lights: what about yellow / blinking. Stop signs: why do they sometimes stop, sometimes not. etc.
Is there any single word/phrase concept which you'd rather use instead? Is it much more convenient and clear than a semaphore?
Is it much more convenient and clear than a semaphore?
My point was simply that the only reason "semaphore" is clear is that you've already heard or read those lectures talking about it. There's nothing about the railroad meaning of the term that would imply the computer science meaning. A counter that can never go below zero is entirely unrelated to a pivoting arm. Once you know the connection it can serve as something of a reminder, but that's different from having an intuitive understanding in the first place.
EDIT: Getting from RR semaphore to CS semaphore requires several leaps of intuition. On the railroad side you need to recognize that it is a signal that's important, and it's irrelevant whether that signal is implemented as a semaphore, an electric signal, or even a flagman. On the computer science side you need to know the difference between a semaphore and the many other types of mutual exclusion mechanisms.
So in that sense "semaphore" is poorly named: it provides insufficient context to distinguish the concept from other similar, yet distinct, ideas.
In many of those the name doesn't provide much context at all to distinguish the concept from another one.
Names of things, at least in CS, are rarely ever named in a way to provide sufficient context of their meaning, without having some level of background context.
That is what the description of the concept is for, not the name. A name is too short to provide this. Usually the name is at the discretion of the original author (aka Dijkstra in this case). If you are trying to determine the complete concept of something, or you are trying to distinguish the concept from other similar ideas, only from the name, you are doing it wrong. Go read the documentation about the concept for that.
I'm not saying a name has to provide sufficient context. I'm saying a name that provides that context is superior to one that does not.
A red-black tree is named so because those were the colors their laser printer supplied. This is (much like semaphore) a historical accident. Yet, just by the name red-black, you do indeed have enough context to distinguish it from a plain old binary search tree. The defining feature is that you color every node either red or black, thus red-black in the name.
Likewise with splay tree: the defining function is the splay operation, which the name is a vivid reminder of.
Splay tree and red-black tree are Good Names (TM). B tree, B+ tree, B* tree, Bx tree are Bad Names (TM).
> most computer scientists / mathematicians, Dijkstra was overly fond of obscure, single-letter names, which nobody ever intuitively understand
If you use a word too common, it's too easy to confuse the definition of that word with the definition of its use in a different context. You must not understand the pain of reading two mathematical books and trying, with the utmost sincerity, to figure out whether the authors are actually using the words the same (such as set, relation, abstraction, object, even things like function). There's the distinct definition, and there is the contextual use, which can theoretically differ for everyone depending on the origin path of native language. You can never know if someone is altering the definition or discovering / describing something new, and they are using the wrong word. Discovering the perfect word for the concept you construct in code or math is an art.
> "Dijkstra was probably a bit heavy on drugs or something (I think the official explanation is that P and V are the first letters in some Dutch words, but I personally find the drug overdose story much more believable)."
I personally have made remarks that were WAY more potentially offensive than that, verbally, with friends. But, accusing Dijkstra of being a drug user, on the Linux kernel mailing list, from a @transmeta address?
That's grounds for termination in many companies, even if you don't get lawyers involved. I think he's only able to get away with it because he's Linus.
Technically, you're right. Today, any junior employee that made a joke like that would be crucified and practically blacklisted from the industry. What you should be thinking, however, is not "why does Linus get away with saying things like that?" but rather "why do I consider it normal for someone's livelihood to be permanently destroyed over a stupid joke?"
The appropriate response is to immediately start thinking why you need to figure out a response to any low-quality joke people make. Maybe, just maybe, there is absolutely no need to make any response whatever. Think about it.
I think if people would stop pretending to be offended by ridiculous things, the problem would solve itself and we wouldn't need to have a "response" in the first place.
That's the idea. Any developer or researcher who uses obscure or meaningless names like "P" and "V" needs to be publicly shamed for it, no matter how well-respected they are. Clarity is vital in this business.
Dijkstra was Dutch and lived in the Netherlands, where drug policies are either nonexistent or loosely enforced. So perhaps it's less offensive than it seems on the surface.
Actually, Linus is from Helsinki that's not exactly close to The Netherlands.
There is no intrinsic connection between Finland and The Netherlands. Although the drug laws are still famous all over.
Semaphores are basically unused in the kernel these days, abandoned in favor of mutexes. I really recommend reading kernel/locking/mutex.c. You can learn a lot about the details of low-level synchronization, and it's also just a really impressive piece of ruthlessly-optimised code. Tricks like reading the lock's 'owner' field without any sort of synchronization, to decide whether to spin on the lock or to sleep.
it's surprising that Linus answers peacefully and pedagogically ! and thus it's a nice read to refresh the definition of semaphores, spinlocks and mutexes.
Maybe you can edit the title and add a little [199] :-)
Like most of the media, someone being polite rarely makes headlines.
Linus spends much of his time being polite and helpful on many mailing lists, but it's his outbursts that make headlines, and convince the easily convincable that he's "rude".
I came on here to say exactly this. Fairly or unfairly, Linus gets a lot of criticism for being pretty harsh in some situations. In discussions about this people often stick up for Linus saying that either he's joking or that the people he's addressing can take it and that he's not that way with everyone. I guess this is the first time I've ever seen this side of Linus get any sort of prominence on HN (or elsewhere).
Erm, firstly he didn't say Peter's Samuelson's CS education was bad. He said "your CS courses were bad" and then follows up with "...(or just didn't cover much about concurrency, which is fairly common) ..." . I think, you might have been trying very hard to find some of the Linus you imagined in there.
Can you blame Linus for saying that? I can't speak with precision about 1999, but someone assumed to have a CS education should know the difference between semaphore and spinlock.
Whether they did or didn't know about the definitions, they should have at least looked it up to confirm before posting to the Linux Kernel Dev list or taking on Linus on the matter.
Link is to a collection of Linus's comments, from between 1999 and 2008, about how to efficiently implement mutex and semaphores in the Linux kernel.
Linus is the BDFL of the Linux kernel, and he obviously needs to think about the big picture. And yet in these comments he gets into nitty-gritty assembly language.
I love it when a big picture guy also sweats the details.
Nice read, but still some misconceptions and misunderstandings.
If I am in a multiprocessor design, where each processor runs completely independent from each other and there is no central organizational unit like a central OS, like in many embedded designs, there is no such thing as a spinlock. A semaphore is also not used for sending processes to sleep.
Linus explanation makes sense, speaking only about a single OS, which may is comprised of multiple processors. But makes no sense at all for real multi-processor designs. I don't blame him for his narrow view, more his professors that he never experienced real embedded design.
I have worked in the past with multi-processors designs, where one processor was using one OS, like OS9, and the other processor had no OS at all running. Both processors where exchanging data via a dual ported RAM. So basically both processes where competing for the same resource. But even than, using a semaphore does not prevent race-conditions. Even then it is very tricky to use them. Because dual ported RAM allowed real concurrent access to the same resource.
In summary, the semaphore definition is the more broad definition where 2 (or more) processes are competing for a single resource. The naming convention within Linux is more specific for this requirements. The "invented" spinlock is just a renamed version for a fast semaphore, AFAIK. Even a mutex is just a special case of a semaphore.
You're blaming Linus for not building something that can synchronize with another processor that isn't running Linux? That seems like it is well outside the scope of Linux's intent. (Linux will get you synchronization if both processors are running Linux as one image, but a processor running no OS? Why do you expect the Linux processor to be able to synchronize with that? I'd expect you to have to implement your own synchronization there, since the other processor is completely outside of Linux's control.)
The "benchmarks game" has a thread ring benchmark which can compare conditions/mutexes/semaphores, because for some applications they're interchangeable:
That whole site is fascinating, I just lost like 2 hours of my life. Late 90s threads about TLB strategies from @sgi.com e-mail addresses? F&$% me, I could read that sort of stuff all day. (I started http://www.snakebite.org, for reference.)
The possible future improvement that Linus mentions here:
For example, the per-VM memory management semaphore could very usefully
be a blocking read-write lock, but without heavy thread contention a
mutex semaphore is basically equivalent.
has actually come to pass: the mm_struct is now protected by struct rw_semaphore mmap_sem.
This is why I LOVE reading what Linus has to say, it's all very practical, and pretty much ignores theory, because he has actual evidence to support his claims.
""
They originally had operations called "P()" and "V()", but nobody ever remembers whether P() was down() or up(), so nobody uses those names any more. Dijkstra was probably a bit heavy on drugs or something (I think the official explanation is that P and V are the first letters in some Dutch words, but I personally find the drug overdose story much more believable).
""
Ah, semaphores. I recall in my operating systems course we had to implement some simple concurrency patterns using semaphores as an exercise -- for instance synchronize a group of processes so that idle processes gather into groups of N, and when a group is ready, N-1 threads of the group does TaskX(), and 1 thread does TaskY(), and when they're done, they go back to the pool. Then we implemented the same using usual mutexes and condition variables. I encourage everyone to do that, to see how one of the approaches is much more complicated than the other. Linus says that almost all practical uses of semaphores is when they are just used as mutexes, and rightly so -- implementing concurrency patterns based only on semaphores is a total pain.
> Linus says that almost all practical uses of semaphores is when they are just used as mutexes, and rightly so -- implementing concurrency patterns based only on semaphores is a total pain.
Semaphores are not very good in practical programming but they're still valuable as a mental model and reasoning about algorithms.
It is a lot easier to prove by induction that an algorithm implemented with semaphores works than it is to deal with mutexes and conditions in a formal proof.
So semaphores are very useful in theoretical work, mutexes and conditions are more useful in practice.
Semaphores make a lot more sense if you know their literal translation of the Spanish word "semaforo", which is a "signal". It's also used in Spanish to describe a traffic light.
Semaphore and spinlock sure are different. People rarely use spinlock in user mode programs since there are better synchronizing mechanisms in user mode than busy waiting with spinlock. However, in kernel mode program spinlock is invaluable since it's the only synchronizing mechanism that works in any interrupt level.
Wow I remember some of this class from my OS course 10 years after this was written.
We were just told about mutexes, but I never knew that it stood for Mutual Exclusion. (I did correctly intuit that a mutex was simply a specific use case of semaphore though so I'm happy and managed to pass the course in the end :D)
There's nothing vintage about them. Understanding these is crucial to understand how and why modern concurrency tools work. As I noted above in another comment, the issues that need mutexes or semaphores have never went away, only you might not realize that they're there.
And please, don't come up with async/await, callback/continuation, node.js' async stuff. They are not a replacement for mutual exclusion, etc.
http://www.cs.utexas.edu/users/EWD/transcriptions/EWD00xx/EW...
Here is a implementation of P and V, the original counted semaphore primitives, from 1972.
http://www.fourmilab.ch/documents/univac/fang/
This is UNIVAC 1108 assembly code. Along with P and V is the code for bounded buffers, with the operations "PUT" and "GET". Bounded buffers are what Go calls "channels". Note how simple they are if you have P and V. That code even works on multiprocessors. There's one semaphore for "queue full" and one for "queue empty". PUT does a P on "queue full", puts on an item, and does a V on "queue empty". GET does a P on "queue empty", takes off an item, and does a V on "queue full". It's very simple. That's the real use case for P and V. Linus' note indicates that in 1999 he didn't know this.
(I didn't write those primitives, but I've used that code, and once ported it to a Pascal compiler I adapted to handle concurrency.)
This stuff was all well understood four decades ago. Much of it was forgotten outside the mainframe world, because threads and multiprocessors didn't make it to microprocessors for several more decades. UNIX, for a long time, had very primitive synchronization primitives. Early UNIX didn't have threads, and even after it got threads, it took years before the locking primitives settled down. The DOS/Windows world didn't get them until Windows NT, circa 1993.
It's been amusing to me to see bounded buffers resurface in Go. They're quite useful, and I've been using them in concurrent programs for many years.