It seems to me that all build systems are just DAGs with syntactic sugar and fun...

AceJohnny2 · on July 16, 2023

There are a few more components. In particular, how the build system decides what needs to be done, and how tasks are ordered, are significant.

For an excellent synthesis of what Makes a build system, I can't recommend the 2018 paper "Build Systems A La Carte" enough:

https://www.microsoft.com/en-us/research/uploads/prod/2018/0...

By Andrey Mokhov, Neil Mitchell (now working at Meta on the Buck2 build system) and Simon Peyton Jones (one of the founders of Haskell)

AceJohnny2 · on July 17, 2023

oh, and Andrey Mokhov since works at Jane Street on the build system Dune:

https://blogs.ncl.ac.uk/andreymokhov/moving-to-jane-street/

mananaysiempre · on July 16, 2023

> It seems to me that all build systems are just DAGs with syntactic sugar and functions.

True in the broadest sense, but there are choices to be done regarding the possibility of discovering what the graph is or has become on the fly and the propagation directions. See “Build systems à la carte”[1,2] for a systematic exploration.

(See also a neighbouring comment[3] re how the discourse structure[4] of the build script might be important in a way orthogonal to these execution-engine issues. The boundary between the build system and the build tool proper can be drawn in very different places here.)

[1] https://dx.doi.org/10.1145/3236774

[2] https://youtu.be/BQVT6wiwCxM

[3] https://news.ycombinator.com/item?id=36749885

[4] https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-...

chriswarbo · on July 16, 2023

Nix represents its builds in a way that's similar: each "derivation" is a text file like /nix/store/XXXXXXX-foo.drv, where XXXXXX is a hash of that file's content. This way each file can reference any others by their path, and there can never be cycles (unless we brute-forced SHA256 to find a pair of files which contain each others hashes!). This uses of hashing requires the files to be immutable, but that's good for caching/validation/reuse/etc. anyway.

Note that Nix doesn't use a shell to execute things, it uses raw `exec` calls (each .drv file specifies the absolute path to an executable, a list of argument strings, and a set of environment variable strings). Though in practice, most .drv files specify a bash executable ;)

klysm · on July 16, 2023

I didn’t realize that content addressing girls like that prevented circular deps by construction (with very high probability). That’s a super cool property to get as a bonus

ur-whale · on July 16, 2023

> It seems to me that all build systems are just DAGs

Yeah, well, it's a little bit more than that.

Two things come to mind that don't neatly fit the DAG mental framework:

    - dynamically generated dependencies (e.g. when you compile a C++ file only to discover that it #includes something and therefore has a dependency on that thing, and therefore the DAG has to be updated on the fly). Creating them by hand is horribly tedious, and/or borderline impossible (#includes that #include other #include ad infinitum)

    - reproducible builds, where a build system is capable of rebuilding a binary from scratch down to having not a single different bit in the final output assuming the leaves of the DAG haven't changed. A desirable feature that is darn near impossible to do unless you pair the DAG with something else.

mike_hock · on July 16, 2023

- detection/management of external dependencies

- defining different build types (debug/release)

- optionally building and running tests

- incremental builds (detecting what has changed)

That doesn't necessarily run counter to the concept of a DAG, but the organizational structures to manage this is what makes the build system. Topologically sorting the dependencies isn't the hard part. That's why make isn't a build system. It is the generic DAG runner, but that's not sufficient.

wahern · on July 16, 2023

> if there were some Unix commands that manage an arbitrary DAG

There is: tsort. It's a POSIX utility, even, not just a GNU or BSD utility.

nsajko · on July 17, 2023

It's from Seventh Edition Unix, actually. So a decade before Mk, for example.

throwawaaarrgh · on July 17, 2023

Wow, that's awesome!

smorrow · on July 19, 2023

Mash from Inferno https://inbox.vuxu.org/9fans/AANLkTikvTbqLwrxifri86cAEBS3wmk...

duped · on July 16, 2023

Programs are just DAGS with syntactic sugar

kdmccormick · on July 16, 2023

Not sure I follow.

  0001 GOTO 0002
  0002 GOTO 0001

How would that be a DAG?

duped · on July 16, 2023

More of just a graph like:

V: (0001, 0002) E: ([0001, 0002], [0002, 0001])

But really I was just being sarcastic. Builds are "just" DAGs with syntactic sugar just like programs are "just" DAGs with syntactic sugar. That's one possible abstraction and one that is inherently lossy, because what "sugar" is available is the actual meat that makes one build system useful and another terrible. And one actually awful limitation is acylicity, since any build system will eventually deal with it (and outright forbidding of it makes your build system incapable of representing certain builds, just like acyclicity fundamentally limits a program's execution)

You can represent pretty much any data model as a graph, that doesn't really address the problem.

kdmccormick · on July 18, 2023

Right, I understand the graph metaphor, my objection is that programs in a turing complete language do not seem like they can exhaustively modeled as DAGs. Graphs, sure; but AFAICT DAGs fail due to the halting problem (in some abstract sense).

A build system modelling itself as a DAG, on the other hand, is very consciously taking on acyclicity as a feature.

I am curious what builds would need to be cyclic. Artifact A is built from B and C, where C is built from D and A? Does that happen in any well-designed build?

duped · on July 19, 2023

A syntax tree is also a DAG.

> I am curious what builds would need to be cyclic.

Compilers, interpreters, kernels, and the things that depend on them. So lots.

> Does that happen in any well-designed build?

Whether or not it's 'well designed' is irrelevant, the question is 'does it need to be possible' because even if it's rare it's still present in a lot of foundational software, notably GCC.

Zambyte · on July 16, 2023

I think they meant more the syntactic representation, rather than the control flow. If you rewrite the program using s-expressions, the DAG becomes a little more clear

    (begin
        (0001 GOTO 0002)
        (0002 GOTO 0001))

The DAG has two branches, each with three leaves. There are no cycles in the syntax, even if there is in the execution.

kdmccormick · on July 18, 2023

Sure, but I would imagine that GGP would have said been more specific and said "tree" instead of "DAG" if they were talking about the AST.