On Go, Portability, and System Interfaces

geofft · on Sept 22, 2015

This is confounding two separate issues: using the system call interface instead of the platform C library, and static vs. dynamic linking. It's an easy thing to confound, because (as the author states) on many platforms, including not only Solaris but also OS X, the only interface to the libc is dynamic linking. But you can certainly dynamically link to libc.so without dynamically linking to anything else, and without using dynamic linking within your own language community. (Rust takes roughly this approach.)

There's one more subtler problem with the "defined to exist on POSIX" thing: strictly speaking, what's defined to exist is a C-language interface. Some interfaces may be defined to use macros, and some platforms may make things work when compiled through the C compiler but not through the dynamic-linker interface. One example that I've run into recently, when binding things in Rust, is the cmsg API, which is primarily defined in terms of macros to walk a heterogeneous array, and has to be reimplemented in Rust (with platform-specific code): https://github.com/carllerche/nix-rust/pull/179. Another example is Android, which before 5.0 (Lollipop) exposed a handful of signal-handling functions as inline functions in a C-language header file, so they were not actually dynamically linkable: https://github.com/rust-lang/rust/commit/a8dbb92b. In both of these cases I would have loved to just #include the C header, but cross-language, that wasn't an option.

It does seem that cgo has some ability to just #include the C header, as demonstrated here. I'd be curious to know if it's powerful enough to handle these two cases.

(Also, the author is totally right in saying that this doesn't belong inline in random application code. If you find yourself writing code like this, it should end up in an abstraction library so that porters have a single place to look, and any application calling `tcsetattr` gains the portability benefits.)

pcwalton · on Sept 22, 2015

> Some interfaces may be defined to use macros, and some platforms may make things work when compiled through the C compiler but not through the dynamic-linker interface. One example that I've run into recently, when binding things in Rust, is the cmsg API, which is primarily defined in terms of macros to walk a heterogeneous array, and has to be reimplemented in Rust (with platform-specific code):

The worst offender I've seen here is Xlib, which has to recreate the internal layout of large structs like Display that have haphazardly grown fields over the decades in order to deal with macros that reach into them: https://github.com/servo/rust-xlib/blob/master/src/xlib.rs#L...

tedunangst · on Sept 22, 2015

There are C binding equivalents for many of the macros in Xlib (having played this game before and recently). This may be slower than a macro, but you can usually cache the result. But really, shouldn't you be using (XML/)XCB? :)

pcwalton · on Sept 22, 2015

Last I looked into it GLX didn't work well with XCB. You can use XCB for some stuff, but you can't drop Xlib.

It doesn't look like this has changed, unless this page is out of date: http://xcb.freedesktop.org/opengl/#index5h1

geofft · on Sept 23, 2015

Fascinating! I was wondering why nobody had done an XCB-based set of X bindings for Rust, given that XCB was basically designed for this purpose (well-typed, high-level metadata to generate good bindings from).

If I'm reading that page correctly, the problem is that GLX is specified as an API + implied ABI, not a wire protocol, and part of that API/ABI contract is "I was compiled with <X11/X11.h> and -lX11, and you'd better give me structures compatible with those"? And half of the GLX implementations are closed-source? Sigh.

smegel · on Sept 22, 2015

> Basically, the Go folks want to minimize external dependencies and the web of failure that can lead to.

What is he ranting on about? The reason Go doesn't link against system libraries is that Go in binary-incompatible with C, as in it's calling convention is different. There is a way to call C through a shim but it's extremely slow which is why they implemented their own syscall interface for Go. Static linking is just a side-effect of this.

Edit: the Go team want to support dynamic linking and it is on the roadmap but not fully implemented as of 1.5 https://docs.google.com/document/d/1nr-TQHw_er6GOQRsF6T43GGh...

tedunangst · on Sept 22, 2015

I doubt go to C calling conventions are going to be the slowest part of calling tcsetattr(). Using ioctl() is being difficult for the sake of being difficult.

pcwalton · on Sept 22, 2015

> I doubt go to C calling conventions are going to be the slowest part of calling tcsetattr().

Well, by "C calling conventions" it really means "performing a stack switch to get out of M:N, allocating a big stack if necessary or taking a lock to check one out". It's quite a lot of overhead and I wouldn't be surprised if it dwarfs the cost of the SYSENTER/SYSEXIT pair in many cases.

4ad · on Sept 23, 2015

It's not necessary to allocate a large stack, there is always one available because threads have their own stack, and you can switch to it at all times because goroutine threading is cooperative. In other words, if you are scheduled to run on a thread, you and only you have access to that thread stack.

Stack switching doesn't take more than a few machine instructions.

The problem with tcsetattr is that it's a library function, so you (generally[1]) need the target library in order to create a program that calls it. This is bad because in Go we value cross-compiling a lot, so we can't depend on having other people's shared libraries available in the build environment.

Syscalls, however, we know how to do them and we know what they are, so we don't need any special target library in order to generate programs for the target.

[1] On platforms like Solaris and Windows, where we are forced to use target shared libraries, we cheat a little and encode the target symbol names that we need, and we rely on standardized ELF/PE symbol resolution mechanism creating a scheme that's essentially equivalent to having import libraries available. But this is very complicated and takes a lot of code in the linker, so it's not at all unexpected that we avoid it whenever possible.

pcwalton · on Sept 23, 2015

> Stack switching doesn't take more than a few machine instructions.

In Rust we found that stack switching was really slow, even when fully optimized to that scheme. CPUs really don't like it. We thought "it was only a few instructions; how bad can it be"; turns out that any function call overhead can quickly add up to 2x or more overall performance penalties in some programs.

See the performance numbers in this thread for cgo (using the same technique) [1]. According to the first benchmark, calling the function via cgo is 16x slower. You're at 161ns for a no-op function call.

Consider that the cost of a syscall can be below 100ns [2]. The overhead can add up quite a bit!

[1]: https://groups.google.com/forum/#!topic/golang-nuts/RTtMsgZi...

[2]: http://forum.osdev.org/viewtopic.php?p=209429&sid=9bcec5b684...

tedunangst · on Sept 23, 2015

There's theory and there's practice. Saying that using ioctl is the right choice because it helps cross compilation rings a little hollow when the cross compiled binary obviously won't run due to the target system not having said syscall.

4ad · on Sept 23, 2015

I'm sorry, but what?

If the system call doesn't exist, then the program won't compile, because the system call stub won't be defined for that platform.

If the system call is ioctl, and the user tries to call with an request that doesn't exist, the program won't compile because the request will not be defined.

But we don't even export ioctl! First of all, it's not type safe. We only expose ioctl wrappers that are type-safe and use requests defined by the target platform, not arbitrary integers passed by the users.

Of course if the user does an indirect system call, and mucks with unsafe, and passes it arbitrary data, than that's a user problem, not a Go problem.

Using ioctl internally by Go is the right thing.

Users indirectly calling ioctl and breaking type-safety with unsafe is just bad code, but that doesn't have anything to do with Go. Go won't prevent you if you want to shoot yourself like that.

tedunangst · on Sept 23, 2015

What is the linked post about if not code that could have been portable but wasn't? I'm having a hard time determining if you're saying the code in question was right or not.

4ad · on Sept 24, 2015

Well the code in the linked post doesn't compile on Solaris because SYS_IOCTL is not defined on Solaris. So that was my point. You don't get code that compiles but doesn't work, unless you try really hard.

But the code is bad. If you have to call system calls through the indirect system calls and use unsafe to pass parameters, you either do something wrong, or we did something wrong. In this case the author of that code is excusable. It was our fault. We didn't expose the necessary type-safe wrappers, so the authors was forced to write code like that.

This is changing however, just yesterday I committed support for termio-related stuff for Solaris. I assume Linux, BSDs and the other systems will follow very soon.

tedunangst · on Sept 22, 2015

All of which is likely dwarfed by all the code that runs between SYSENTER and SYSEXIT. And how many freaking times do you need to call tcsetattr()?

Even worst case, if calling overhead is a problem, the Go team can likely improve it at some point, but they are less likely to magically map special cases of syscall.SYS_IOCTL to tcsetattr for the sake of portability.

4ad · on Sept 23, 2015

Not sure I understand your point, tcsetattr calls a system call on its own...

So the alternatives are between doing a trivial system call on the Go stack, or doing a complex stack switch operation that does the (too complicated) System V calling convention ABI translation, and then calls a function that does a system call...

nulltype · on Sept 23, 2015

If you avoid calling c code, you don't have to use cgo, which may be a better reason than "being difficult".

4ad · on Sept 23, 2015

> The reason Go doesn't link against system libraries is that Go in binary-incompatible with C, as in it's calling convention is different.

I'm sorry, but the system call "calling convention" is different too. This argument doesn't hold water. Calling convention translation is a trivial issue. Go doesn't usually use system libraries in order to minimise dependencies and to allow easy cross-compiling.

> There is a way to call C through a shim but it's extremely slow

Not really, it's actually pretty fast. Of course it will be always slower than just a regular function call, but it's sufficiently low-overhead that in most cases you can ignore the overhead. Of course there are still cases where you can't ignore the overhead.

On systems like Windows and Solaris, the cgo mechanism is actually used to do back-end "system calls" all the time.

smegel · on Sept 23, 2015

> I'm sorry, but the system call "calling convention" is different too.

Sufficiently different to not be a problem? As I understand it system calls don't usually rely on the stack whereas C calls do, and it is the stack mismatch between C and Go that is the issue.

> but it's sufficiently low-overhead that in most cases you can ignore the overhead

Maybe for user called function...but if every single system call had to swap out the stack surely that would have a pretty heavy impact...

geofft · on Sept 23, 2015

The Go calling convention passes everything, in both directions, on the stack. Most platform-native ("C") calling conventions put as many things in registers as possible, and reuse one of the registers for the return value when possible. However, this is just a convention from one function to the next, and they can share the same stack; the caller of a C standard library function merely needs to know what to put on the stack and what to put in registers (and which registers to save). And in fact, the C calling convention is more efficient than Go's from the caller's point of view (fewer things have to go on the stack, some registers are promised not to be clobbered), so calling a C library function with the native calling convention is, on all reasonable platforms, a fine idea performance-wise.

There is the problem of the stack itself. The C model envisions a large, contiguous stack, possibly with some kernel magic to grow the stack as you approach the end (by handling page faults). Go previously used "segmented stacks", essentially a linked list of heap-allocated buffers used as stack, where functions knew how to allocate and deallocate segments when necessary. C doesn't know how to do that, so a C call previously would have to allocate plenty of memory just in case the function wanted to use a bunch of stack space.

Go no longer uses segmented stacks, so that's essentially not a concern. Go still does reallocate stacks, but does so by copying the entire stack, and the stack is basically never freed. So there might be a one-time performance penalty on the first C call (unless the compiler is smart enough to tell when a Goroutine might possibly call into C, and pre-allocate that space), or the first C call after most of the existing stack was used in Go code, but not after that.

tl;dr: The C calling convention is strictly faster than the Go one (but more complicated), and in Go 1.3+ the existing stack is fine to use, so there shouldn't be performance concerns.

(Unless, of course, you think the C library itself is slow. But that's a very different problem.)

4ad · on Sept 23, 2015

> and the stack is basically never freed

Not true, stacks are still freed if utilisation drops below a certain percentage.

> So there might be a one-time performance penalty on the first C call [...] in Go 1.3+ the existing stack is fine to use

Not true, contiguous stacks start even smaller than segmented stacks, also they can't be used for C code at all. Only type-safe code with stack maps can run on the goroutine stacks, because the precise garbage collector has to scan the stack and has to understand where on the stack there are pointers and where there aren't. C code breaks this requirement.

For calls into C, the thread stack is used, which in Go usually has 32kB, but varies on some platforms, for example on Windows, it's 256kB IIRC. The runtime switches the stack (which only takes a few instructions), does the ABI translation and calls the function.

The stack switching mechanism and ABI translation mechanism have trivial overhead, most of the overhead comes from runtime bookkeeping, e.g. the runtime must ensure there are always enough threads to run user-level Go code.

geofft · on Sept 25, 2015

> also they can't be used for C code at all. Only type-safe code with stack maps can run on the goroutine stacks, because the precise garbage collector has to scan the stack and has to understand where on the stack there are pointers and where there aren't. C code breaks this requirement.

Is this still true of C code that doesn't call back into Go code, like libc itself? The Go garbage collector will never get triggered, and any Go code should treat anything below %esp after the call completes as uninitialized data.

(Also, on Windows, don't actual system calls reuse the same stack, too?)

It seems like if you wanted to design Go for dynamically linking to libc instead of making raw system calls, it wouldn't be difficult.

4ad · on Sept 25, 2015

> Is this still true of C code that doesn't call back into Go code, like libc itself? The Go garbage collector will never get triggered, and any Go code should treat anything below %esp after the call completes as uninitialized data.

I should give a more thorough explanation.

If you call C code, the only way that would be useful is if we pass Go-allocated buffers to the C code. This only works if we pass to the C code untyped uintptr's, which must live on the stack because of garbage collector reasons.

In your hypothetical world, you want to call C code on the g stacks. This can only works if the g stacks are copied into a larger stack. The stack copying mechanism inspects the stack map to see which pointers need updating. This stack map is also used by the garbage collector, but I shouldn't have mentioned it in my original post.

The stack copying mechanism can't update the untyped uintptr's living on the stack, therefore you can't call C code on the g stack.

There are other reasons why calling C code on the g stack would be a bad idea but I won't get into them. Final point is, C code is called on the system stack.

> (Also, on Windows, don't actual system calls reuse the same stack, too?)

Which stack and which system calls?

On Windows, target calls are made on the g0 stack, which is the thread stack. These calls are just calls into regular C code, most of it in kernel32.dll, some of it in ntdll.dll. This C code will eventually do system calls. Those actual system calls don't need any stack.

4ad · on Sept 23, 2015

> Sufficiently different to not be a problem? As I understand it system calls don't usually rely on the stack whereas C calls do, and it is the stack mismatch between C and Go that is the issue.

Switching the stack only takes a few machine instructions. Most of the overhead comes from runtime bookkeeping, in contrast ABI translation and switching the stack take a trivial amount of code and time.

> Maybe for user called function...but if every single system call had to swap out the stack surely that would have a pretty heavy impact...

As mentioned, on some platforms, for example Solaris and Windows, this always happens, and even on other platforms, in some conditions, it happens too. Not to mention that the runtime switches stacks for its own reasons all the time. The overhead is totally dwarfed by the cost of doing system calls in the first place.

readams · on Sept 22, 2015

Static linking is independent of this concern. Go could implement dynamic linking but doesn't for ideological reasons.

I'd really hate to be a distribution maintainer in a world where go components are common. Security bug in the TLS library? Time to download 14GB of updates!

xyproto · on Sept 22, 2015

Not ideological reasons. The latest version of Go supports dynamic linking for some platforms.

pjmlp · on Sept 22, 2015

On Android there is no way around it, as native code can only be accessed via dynamic linking.

This is what some Go designers think of dynamic linking:

http://harmful.cat-v.org/software/dynamic-linking/

GauntletWizard · on Sept 23, 2015

And those reasons are clear and clear-cut. Dynamic linking is a net negative in most use cases, and certainly in all the use-cases that the Go developers are developing for, where reproducible builds are mandatory, code size is a tiny fraction of main memory, and code quality is very strictly maintained. When you've built a good development and deploy strategy, rebuilding and redistributing your binary is an everyday activity. There's no additional cost.

shadowmint · on Sept 23, 2015

Yeeesss... but ultimately, Go is also a pragmatic language.

Dynamic linking is coming to Go, like it or not; because it's a feature that is needed in certain circumstances, and that has been acknowledged, even by the Go team.

bahamat · on Sept 23, 2015

That same web page lists Linux as harmful.

http://harmful.cat-v.org/software/operating-systems/linux/

marssaxman · on Sept 22, 2015

I'm happy to be an end user in a world where go components are becoming more common, because statically-linked executables are far less likely to break when I upgrade unrelated programs.

nulltype · on Sept 22, 2015

Just download the deltas instead?

laxk · on Sept 23, 2015

The method proposed by the author does not work for packed structs. The btrfs library has a lot of them.

main.go

  package main
  /*
  struct packed {
    unsigned char a;
    unsigned long long b;
    unsigned char c;
  } __attribute__((packed));
  */
  import "C"
  type Packed C.struct_packed

go tool cgo -godefs main.go

  package main
  type Packed struct {
	A		uint8
	Pad_cgo_0	[8]byte
	C		uint8
  }

We lost packed.b field. Go doesn't support packed structs. More info here: https://groups.google.com/forum/#!topic/golang-nuts/UX5srUMt...

pcwalton · on Sept 22, 2015

Using the syscall interface is a lot faster in Go than calling to libc, because the system libc is going to expect large stacks and so you incur the overhead of a stack switch when you switch out of M:N threading into the C world. I assume that's why Go calls syscalls to begin with.

zaphar · on Sept 23, 2015

The correct go way to do this is to use conditional compilation and hide your platform specific optimizations in a lib_{linux,darwin,...}.go file with the slower default in a file that only gets compiled if the other files don't.

And indeed this is how the stdlib does it. If the rest of the go community is not adopting this practice then they aren't following good Idioms in their Go code.

awalton · on Sept 23, 2015

And I totally would have agreed with the author of this article if the tl;dr of the article was just the first sentence you wrote.

Unfortunately, sanity did not prevail today.

teacup50 · on Sept 22, 2015

However, calling directly into the syscall interface is illegal on most platforms other than Linux.

I'm not sure Go can fix this without getting rid of their M:N threading.

pjc50 · on Sept 23, 2015

I'm assuming you mean "unsupported" rather than "prohibited by law", although with the DMCA one can never be sure.

4ad · on Sept 23, 2015

On Windows and Solaris, Go doesn't do its own syscalls.

teacup50 · on Sept 23, 2015

How about OS X?

(and, how does that work when calling into userspace APIs that assume fixed 1:1 thread/stack correlation)

4ad · on Sept 23, 2015

On OS X Go does its own system calls, ignoring the OS X guidlines.

Go uses M:N scheduling, but you still have N threads with (relatively) large stacks. When calling into C code, Go switches stacks, so it runs on the (relatively) large stack where C code can run. Of course there's some overhead associated with this, but system call overhead is large on its own compared to this anyway.

During the system call the thread (obviously) can't run any other Go code, but that's ok, the runtime makes sure there are always threads available to run user Go code (that's one reason why the runtime stands between user code and system calls).

4ad · on Sept 23, 2015

No, the stack switch overhead is trivial. As explain in my other comment, Go does system calls in order not to have any dependencies on target libraries.