Hacker News new | past | comments | ask | show | jobs | submit login

It seems to me that all build systems are just DAGs with syntactic sugar and functions. You could do away with build systems entirely if there were some Unix commands that manage an arbitrary DAG, where the state is kept out of band (in a file, in a database, etc) so you can operate on the DAG from any process. That way the shell (or any program, really) becomes your build system and you can compose any kind of logic that requires walking or manipulating a tree of dependencies. This could apply to anything where you need to execute arbitrary jobs with a DAG, not just builds.



There are a few more components. In particular, how the build system decides what needs to be done, and how tasks are ordered, are significant.

For an excellent synthesis of what Makes a build system, I can't recommend the 2018 paper "Build Systems A La Carte" enough:

https://www.microsoft.com/en-us/research/uploads/prod/2018/0...

By Andrey Mokhov, Neil Mitchell (now working at Meta on the Buck2 build system) and Simon Peyton Jones (one of the founders of Haskell)


oh, and Andrey Mokhov since works at Jane Street on the build system Dune:

https://blogs.ncl.ac.uk/andreymokhov/moving-to-jane-street/


> It seems to me that all build systems are just DAGs with syntactic sugar and functions.

True in the broadest sense, but there are choices to be done regarding the possibility of discovering what the graph is or has become on the fly and the propagation directions. See “Build systems à la carte”[1,2] for a systematic exploration.

(See also a neighbouring comment[3] re how the discourse structure[4] of the build script might be important in a way orthogonal to these execution-engine issues. The boundary between the build system and the build tool proper can be drawn in very different places here.)

[1] https://dx.doi.org/10.1145/3236774

[2] https://youtu.be/BQVT6wiwCxM

[3] https://news.ycombinator.com/item?id=36749885

[4] https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-...


Nix represents its builds in a way that's similar: each "derivation" is a text file like /nix/store/XXXXXXX-foo.drv, where XXXXXX is a hash of that file's content. This way each file can reference any others by their path, and there can never be cycles (unless we brute-forced SHA256 to find a pair of files which contain each others hashes!). This uses of hashing requires the files to be immutable, but that's good for caching/validation/reuse/etc. anyway.

Note that Nix doesn't use a shell to execute things, it uses raw `exec` calls (each .drv file specifies the absolute path to an executable, a list of argument strings, and a set of environment variable strings). Though in practice, most .drv files specify a bash executable ;)


I didn’t realize that content addressing girls like that prevented circular deps by construction (with very high probability). That’s a super cool property to get as a bonus


> It seems to me that all build systems are just DAGs

Yeah, well, it's a little bit more than that.

Two things come to mind that don't neatly fit the DAG mental framework:

    - dynamically generated dependencies (e.g. when you compile a C++ file only to discover that it #includes something and therefore has a dependency on that thing, and therefore the DAG has to be updated on the fly). Creating them by hand is horribly tedious, and/or borderline impossible (#includes that #include other #include ad infinitum)

    - reproducible builds, where a build system is capable of rebuilding a binary from scratch down to having not a single different bit in the final output assuming the leaves of the DAG haven't changed. A desirable feature that is darn near impossible to do unless you pair the DAG with something else.


- detection/management of external dependencies

- defining different build types (debug/release)

- optionally building and running tests

- incremental builds (detecting what has changed)

That doesn't necessarily run counter to the concept of a DAG, but the organizational structures to manage this is what makes the build system. Topologically sorting the dependencies isn't the hard part. That's why make isn't a build system. It is the generic DAG runner, but that's not sufficient.


> if there were some Unix commands that manage an arbitrary DAG

There is: tsort. It's a POSIX utility, even, not just a GNU or BSD utility.


It's from Seventh Edition Unix, actually. So a decade before Mk, for example.


Wow, that's awesome!



Programs are just DAGS with syntactic sugar


Not sure I follow.

  0001 GOTO 0002
  0002 GOTO 0001
How would that be a DAG?


More of just a graph like:

V: (0001, 0002) E: ([0001, 0002], [0002, 0001])

But really I was just being sarcastic. Builds are "just" DAGs with syntactic sugar just like programs are "just" DAGs with syntactic sugar. That's one possible abstraction and one that is inherently lossy, because what "sugar" is available is the actual meat that makes one build system useful and another terrible. And one actually awful limitation is acylicity, since any build system will eventually deal with it (and outright forbidding of it makes your build system incapable of representing certain builds, just like acyclicity fundamentally limits a program's execution)

You can represent pretty much any data model as a graph, that doesn't really address the problem.


Right, I understand the graph metaphor, my objection is that programs in a turing complete language do not seem like they can exhaustively modeled as DAGs. Graphs, sure; but AFAICT DAGs fail due to the halting problem (in some abstract sense).

A build system modelling itself as a DAG, on the other hand, is very consciously taking on acyclicity as a feature.

I am curious what builds would need to be cyclic. Artifact A is built from B and C, where C is built from D and A? Does that happen in any well-designed build?


A syntax tree is also a DAG.

> I am curious what builds would need to be cyclic.

Compilers, interpreters, kernels, and the things that depend on them. So lots.

> Does that happen in any well-designed build?

Whether or not it's 'well designed' is irrelevant, the question is 'does it need to be possible' because even if it's rare it's still present in a lot of foundational software, notably GCC.


I think they meant more the syntactic representation, rather than the control flow. If you rewrite the program using s-expressions, the DAG becomes a little more clear

    (begin
        (0001 GOTO 0002)
        (0002 GOTO 0001))
The DAG has two branches, each with three leaves. There are no cycles in the syntax, even if there is in the execution.


Sure, but I would imagine that GGP would have said been more specific and said "tree" instead of "DAG" if they were talking about the AST.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: