There are two main ways of doing that. One is the way GoLang does it. In GoLang ...

There are two main ways of doing that.

One is the way GoLang does it. In GoLang whenever you execute a blocking IO function, the language under the hood executes the non blocking equivalent, and reuses the thread for some other task by literally switching stacks. This is a conceptually easy approach, but it has some downsides.

One big downside is a greater cost when doing FFI calls. Because you have many stacks, they are small by default, and you need to grow them significantly before entering FFI code, since C code assumes a large stack is available. You may also need to dynamically change the size of the thread pool if too many threads are currently running FFI code that is blocking. (A logically simpler but more costly option is to have a separate FFI thread pool with large stacks, and all calls to FFI functions are treated as blocking I/O calls on the main pool, where the FFI function gets scheduled to run in the FFI pool, and when it completes the task that called the FFI gets queued back up to continue running, just like asynchronous IO had completed.)

The other approach that most async-await languages use is called "stackless". In this approach, all functions that can block have a different return type, which is some form of promise-like object (these functions are known as "async"). In order to get the result (or ensure all side effects have been completed), you need to "await" the promise, which is a keyword that causes the function to logically pause execution until the result is available.

However rather than stack switch, using await makes the compiler transform the function. The parameters and "stack variables" for the function are instead stored in an object on the heap, along with information about where in the function we paused. The function is transformed such that under the hood, you pass in this heap object. (Alternatively, the function becomes a method on the heap object.) When the function is called, it looks in the heap object to find which pause location we are at, and more or less does a GOTO to that point in the code.

When an await is encountered, if the value of the awaited promise is not yet available, the function will write its current position into the heap object. It will inform the scheduler that it needs to be called again when the promise is complete. It will return a promise associated with the logical invocation to the caller. If this is the first time it returns the caller will be user code. If it had previously returned, then its direct caller will be the scheduled task runner. When the function finally reaches the end (e.g. a return statement), it will update its promise, and return it (again or for the first time).

The net effect of all this is that each thread has a single stack. The transformations result in the call stack pretty much always being the same as if callback based approach was used, if callbacks got scheduled to run on the threadpool (instead of direct execution). But the code flow as written by the programmer looks more like like normal blocking I/O code, making it much more readable than having a whole bunch of callback functions.

The downside is needing to add the async and await keywords all over the codebase, and having more heap churn.

Of course, there are more downsides and upsides to each approach, and there can be variations. It is not impossible to utilize the async/await keywords with a stack switching approach for example.