Nice syntax, but I like the caching that comes with creating each layer. If you ...

DanHulton · on Jan 27, 2023

I mean, you can do both. Or, technically with that link you mentioned, all three.

You can use HEREDOCS to combo together commands that make sense in a layer, ensure your layers are ordered such that the more-frequently changing ones are further on in your Dockerfile when possible (this will also speed up your builds, ensuring as many caches as possible are more likely to be valid), and use mutli-stage builds on top of that to really pare it down to the bare necessities.

klabb3 · on Jan 27, 2023

> Nice syntax

Is it though? From the post:

RUN <<EOF

apt-get update

apt-get upgrade -y

apt-get install -y ...

EOF

It may be due to my ninja level abilities to dodge learning more advanced shell mastery for decades, but to me it looks haphazard and error prone. Are the line breaks semantic, or is it all a multiline string? Is EOF a special end-of-file token, or a variable, if so what’s it’s type? Where is it documented? Is the first EOF sent to stdin, if so why is that needed? What is the second EOF doing? I can usually pick up a new imperative language quickly, but I still feel like an idiot when looking at shell.

awwaiid · on Jan 27, 2023

The

  <<XYZ
  ...
  XYZ

syntax for multi-line strings is worth learning since it is used in shell, ruby, php, and others. See https://en.m.wikipedia.org/wiki/Here_document . You get to pick the "EOF" delimiter.

hddqsb · on Jan 27, 2023

I know those questions are rhetorical, but to answer them anyway:

> > Nice syntax

> Is it though?

Before the heredoc syntax was added, the usual approach was to use a backslash at the end of each line, creating a line continuation. This has several issues: The backslash swallows the newline, so one must also insert a semicolon* to mark the end of each command. Forgetting the semicolon leads to weird errors. Also, while Docker supports line continuations interspersed with comments, sh doesn't, so if such a command contains comments it can't be copied into sh.

The new heredoc syntax doesn't have any of these issues. I think it is infinitely better :)

(There is also JSON-style syntax, but it requires all backslashes to be doubled, and is less popular.)

*In practice "&&" is normally used rather than ";" in order to stop the build if any command fails (otherwise sh only propagates the exit status of the last command). This actually leads to a small footgun with the heredoc syntax: it allows the programmer to use just a newline, which is equivalent to a semicolon and means the exit status will be ignored for all but the last command. The programmer must remember to insert "&&" after each command, or use `set -e` at the start of the RUN command, or use `SHELL ["/bin/sh", "-e", "-c"]` at the top of the Dockerfile. But this footgun is due to sh's error handling quirks, not the heredoc syntax itself.

> Are the line breaks semantic, or is it all a multiline string?

The line breaks are preserved ("what you see is what you get").

> Is EOF a special end-of-file token

You can choose which token to use (EOF is a common convention, but any token can be used). The text right after the "<<" indicates which token you've chosen, and the heredoc is terminated by the first line that contains just that token.

This allows you to easily create a heredoc containing other heredocs. Can you think of any other quoting syntax that allows that? (Lisp's quote form comes to mind.)

> Where is it documented?

The introduction blog post has already been linked. The reference documentation (https://docs.docker.com/engine/reference/builder/, https://github.com/moby/buildkit/blob/master/frontend/docker...) explains the syntax using examples. It doesn't have a formal specification; unfortunately this is a wider problem with the Dockerfile syntax (see https://supercontainers.github.io/containers-wg/ideas/docker...). Instead, the reference links to the sh syntax specification (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...), on which the Dockerfile heredoc syntax is based.

klabb3 · on Jan 27, 2023

Thanks, this is the helpful reply I didn't deserve!

> This actually leads to a small footgun with the heredoc syntax: it allows the programmer to use just a newline, which is equivalent to a semicolon and means the exit status will be ignored for all but the last command.

This sounds like a medium-large caliber footgun to me, and while I don’t expect Docker to fix sh, it could perhaps either set sane defaults or decouple commands from creating layers? Or why not simply support decent lists of commands if this is such a common use case?

> This allows you to easily create a heredoc containing other heredocs.

Hmm, what’s the use-case for that? The only effect for the programmer would be to change the escape sequence, no?

hddqsb · on Jan 27, 2023

> This sounds like a medium-large caliber footgun to me, and while I don’t expect Docker to fix sh, it could perhaps either set sane defaults or decouple commands from creating layers? Or why not simply support decent lists of commands if this is such a common use case?

Ha ha, I guess footgun sizes are all relative. The quirky error handling of sh is "well-known" (usually one of the first pieces of advice given to improve safety is to insert `set -e` at the top of every shell script, which mostly fixes this issue). So I don't think of Dockerfile heredocs themselves as a large footgun, but rather as a small footgun that arises out of the small interaction between heredocs and the large-but-well-known error handling footgun.

I don't know why Docker doesn't use `set -e` by default. I suppose one reason is for consistency -- if you have shell commands spread across both a Dockerfile and standalone scripts, it could be very confusing if they behaved differently because the Dockerfile uses different defaults.

I also don't know why the commands are coupled to the layers. Maybe because in the simple cases, that is the best mapping; and in the very complex cases, the commands would be moved to a standalone script; so there are fewer cases where a complex command needs to be inlined into the Dockerfile in a way that produces a single layer.

It would be really nice if the Dockerfile gave more control over layers. For example, currently if you use `COPY` to import files into the image and then you use `RUN to you modify them (e.g. to change the ownership / permissions / timestamps), it would needlessly increase the image size; the only way to avoid this is to perform those changes during the COPY, for example using `COPY --chown`; but COPY has very limited options (namely: chown, and also chmod but that is relatively recent).

Regarding native support for lists of commands, I don't really see much value since sh already supports lists (you "just" need to correctly choose between "&&" and ";"/newline).

> > This allows you to easily create a heredoc containing other heredocs.

> Hmm, what’s the use-case for that? The only effect for the programmer would be to change the escape sequence, no?

It can be useful to embed entire files within a script (e.g. when writing a script that pre-populates a directory with some small files). With most quoting schemes, you'd have to escape special characters that appear in those files. But with heredocs, you just have to pick a unique token and then you can include the files verbatim.

(Picking a token that doesn't appear as a line within the files can be a little tricky, but in many cases it's not a problem; for example if the files to be included are trustworthy, it should be enough to use a token that includes the script's name. On the other hand if the data is untrusted, you'd have to generate an unguessable nonce using a CSPRNG. But at that point it's easier to base64-encode the data first, in which case the token can be any string which never appears in the output of base64, for example ".".)