Typically I wind up using a different source image for the builder that ideally has (most of) the toolchain bits needed, but the same runtime base as the final image. (For Go, go:alpine and alpine work well. I'm aware alpine/musl is not technically supported by Go, but I have yet to hit issues in prod with it, so I guess I'll keep taking that gamble.)
I take advantage of multi-stage builds, however I still think that the layer system could have some nice improvements done to it.
For example, say I have my own Ubuntu image that is based on one of the official ones, but adds a bit of common configuration or tools and so on, on which I then build my own Java image using the package manager (not unlike what Bitnami do with their minideb, on which they then base their PostgreSQL and most other container images).
So I might have something like the following in the Ubuntu image Dockerfile:
Which would then work backwards from the last layer and create copies of all of the layers where files have been removed/masked (in the later layers) to use instead of the originals. Thus if I'd have 10 different images that need to use apt to install stuff while building them, I could leave the cache in my own Ubuntu image and then just remove it for whatever I want to consider the "final" images that I'll ship, which would then alter the contents of the included layers to purge deleted files.
There's little reason why these optimized layers couldn't be shared across all 10 of those "final" images either: "Hey, there's these optimized Ubuntu image layers without the package caches, so we'll use it for our .NET, Java, Node and other images" as opposed to --squash which would put everything in a single large layer, thus removing the benefits from the shared layers of the base Ubuntu image and so on.
Who knows, maybe someone will write a tool like that some day.