Why isn’t it possible — or is it — to make libc just use uring instead of syscall?
Yes I know uring is an async interface, but it’s trivial to implement sync behavior on top of a single chain of async send-wait pairs, like doing a simple single threaded “conversational” implementation of a network protocol.
It wouldn’t make a difference in most individual cases but overall I wonder how big a global speed boost you’d get by removing a ton of syscalls?
Or am I failing to understand something about the performance nuances here?
You don't need to start spawning new threads to use io_uring as a backend for synchronous IO APIs. You just need to set up the rings once, then when the program does an fwrite or whatever, that gets implemented as sending a submission queue entry followed by a single io_uring_enter syscall that informs the kernel there's something in the submission queue, and using the arguments indicating that the calling process wants to block until there's something in the completion queue.
> using the arguments indicating the calling process wants to block
Nice to know io_uring has facilities for backwards compatibility with blocking code here. But yeah, that's still a syscall, and given that the whole benefit of io_uring is in avoiding (or at least, coalescing) syscalls, I doubt having libc "just" use io_uring is going to give any tangible benefit.
Not speaking of ls which is more about metadata operations, but general file read/write workloads:
io_uring requires API changes because you don't call it like the old read(please_fill_this_buffer). You maintain a pool of buffer that belong to the ringbuffer, and reads take buffers from the pool. You consume the data from the buffer and return it to the pool.
With the older style, you're required to maintain O(pending_reads) buffers. With the io_uring style, you have a pool of O(num_reads_completing_at_once) (I assume with backpressure but haven't actually checked).
In a single threaded flow your buffer pool is just the buffer you were given, and you don't return until the call completes. There are no actual concurrent calls in the ring. All you're doing is using io_uring to avoid syscall.
Other replies lead me to believe it's not worth doing though, that it would not actually save syscalls and might make things worse.
Can you use io_uring in a way that doesn't gain the benefits of using it? Yes. Does the traditional C/POSIX API force you into that pattern? Almost certainly.
In addition to sibling's concern about syscall amplification, the async just isn't useful to the application (from a latency perspective) if you just serialize a bunch of sync requests through it.
Yes I know uring is an async interface, but it’s trivial to implement sync behavior on top of a single chain of async send-wait pairs, like doing a simple single threaded “conversational” implementation of a network protocol.
It wouldn’t make a difference in most individual cases but overall I wonder how big a global speed boost you’d get by removing a ton of syscalls?
Or am I failing to understand something about the performance nuances here?