The author contends in the article's abstract that machine models are "self-contained" whereas "language models" (not clearly defined, but meant to encompass at least an untyped lambda calculus) are not. But both can be viewed as a state plus a well-defined set of rewriting rules, so I suspect that the article is struggling to make what is really a distinction without a difference.
> But both can be viewed as a state plus a well-defined set of rewriting rules, so I suspect that the article is struggling to make what is really a distinction without a difference.
1. I state that the untyped lambda calculus is, indeed, a borderline case, and is a simple rewriting system which, in turn, is a special case of a nondeterministic abstract state machine.
2. That many things are abstract state machines does not mean that the computational complexity required to validate whether what you have is a well-formed description is similar.
3. That complexity difference is absolutely huge -- zero for the machine models vs. exponential to undecidable for the typed language models. It is hardly a struggle to draw the conclusion that the two are objectively completely different classes.
Also, I gave a list of machine and language models. The latter group consists of the process calculi and the lambda calculi.
Why is the complexity of parsing well-formed states from strings somehow interesting? If you generalize to trees and pick the right alphabet, suddenly every element of your datatype is a well-formed lambda term.
It's also unclear if you're talking about the complexity of type inference or type checking, what annotations you're allowing, etc. Your claims about complexity re: states of typed languages are too hand-wavy to engage with.
And yet there's still a glaring error: for a very popular machine model, namely the C abstract machine, it's undecidable whether a state is meaningful (regardless of how you set up your parsing problem).
So, you have an interesting idea about the difference between machine models and languages but it clearly doesn't hold together. People have been studying the relationship between machine models and languages for quite some time, and it's relatively well understood. Machine models are one of the many ways to give semantics to languages. They just don't have many useful properties compared to more "abstract" approaches, and are mostly used to reason about implementation.
> If you generalize to trees and pick the right alphabet, suddenly every element of your datatype is a well-formed lambda term.
How would you feed trees to the computation? Also, once types are involved, nothing can help you. I admit that untyped LC is somewhat of a borderline case.
> It's also unclear if you're talking about the complexity of type inference or type checking
Doesn't matter in this case. Pick a string representation -- any representation -- and see if it's well-formed. You can include the types and then only type checking would be necessary, or not -- and then it's type inference. Either way, the complexity is significant.
> namely the C abstract machine
That's not a machine model.
> it's undecidable whether a state is meaningful
The same goes for any language and your definition on "meaningful". You can always say that something meaningful is a little finer-grained than the language can express, or the validating the language itself is also undecidable. Got simple types? Only primes are meaningful.
> They just don't have many useful properties compared to more "abstract" approaches, and are mostly used to reason about implementation.
My point is that they're not comparable. A glass of water doesn't have the same useful properties as a blender, yet they're both made of molecules. However, there's an objective difference: a blender consumes far more energy, so of course you can use that energy to do more things.
:) You're just pushing the complexity into the validation of the algebraic type.
Look, a typed formalism does extra work of proving the type safety of the input. Checking a proof has a non-negligible complexity. If that cost could be ignored, then I have a computation model that solves NP problems in zero time: it has a type system that requires that the types form a proof certificate for the problem. All (obviously valid) inputs are then trivially reduced to "yes". In fact, you know very well that some type systems are Turing complete. If the complexity of validation is ignored, then those models can compute anything that's computable in zero time.
Types can prove things. The cost of the proof is well known, and is related to the difficulty of the problem. The universe doesn't let computational complexity slide. What, you think that God says, "oh, you clever boy, you chose a typed model so I'll let you have this proof for free!" ?
> You're just pushing the complexity into the validation of the algebraic type.
Who says it needs to be validated, any more than your string needs to be validated?
> The universe doesn't let computational complexity slide. What, you think that God says, "oh, you clever boy, you chose a typed model so I'll let you have this proof for free!" ?
You're responding to something completely unrelated to my statement.
To be clear: I wasn't suggesting a typed lambda calculus.
A tree is still more complex than strings, because it has rules: one parent per node and no cycles. Those need to be validated. But I have absolutely no problem accepting that it is possible to find an encoding of untyped LC which would bring it very close to a TM encoding. This wouldn't be surprising as LC doesn't do more computational work than TM at the validation step (unlike typed formalisms).
I specifically wrote that untyped LC is a borderline case, and it's unclear where precisely we want to draw the line. What is absoutely clear is that System F and TM are essentially, qualitatively and objectively very different, and when people try to compare them, they often, without noticing, describe two different computational problems, with two different complexities.
You're positing some new notion of complexity, without a definition, where lists are somehow fundamentally different than trees. Sorry, but that kind of claim requires more than just just an appeal to intuitions about how they might be implemented on a typical CPU. They're both just initial algebras.
The fact that your grand ideas about machine and language models doesn't apply to the lambda calculus should be a hint. We know how lambda calculus relates to typed languages: it's a degenerate case with a single type. Any differences in the complexity of type-inference or checking are between language models. You can't (without wildly waving your hands) formulate a meaningfully comparable problem for TM states.
> A tree is still more complex than strings, because it has rules: one parent per node and no cycles. Those need to be validated.
Hmm, I'm not convinced that this is sufficient to show that strings are simpler.
After all, a string has rules too: a cell in the string is either at the beginning, or the end, or the middle, and it has neighbours left, right or not at all depending on those conditions, and at most one such neighbour in each direction. Even appeal to the physical world doesn't help. If I have a physical piece of DNA, how do I know it hasn't mutated so that two different strands branch off it and it becomes, essentially, a degenerate tree?
I do think that the thrust of your argument is correct but it's not as clear cut (at least not yet) as you seem to be making out.
> What is absoutely clear is that System F and TM are essentially, qualitatively and objectively very different
Absolutely agreed! If they weren't, I'd be happy writing my day-to-day code in Turing Machines.
Since you seem to have thought about the fine details of this much more than you are letting on here and on your blog post I suggest you post an addendum (or even a new blog post) to be more precise about what exactly they are. There's a lot missing as it stands, and interested readers like myself are left to fill in the blanks.
Inside the lambda calculus, the only thing you can do with a lambda abstraction is apply it.
And one thing you can't do using your numbering scheme (which is basically a Gödel numbering) is establish whether two syntactically different functions are extensionally equal (under some fixed reduction strategy).
No,it's actually right. See my post comment answering the same question. It is the circuit models that actually pose a harder challenge, but even if you won't accept my reasonable zero-cost encoding where a misconnected input is interpreted as a constant 0,the cost of validation is still less than even untyped LC.
You state there that "at worst I would need O(n) time" so are you in fact conceding that my claim "It takes non-zero effort to determine whether the code for a Turing machine actually implements a valid Turing machine." was correct?
I gave a natural, reasonable encoding that requires zero validation: every string is a valid TM, and every TM can be easily represented by that encoding and directly executed. However, some may object to the encoding on some grounds, so I said that even the strictest critic would agree to a more complex validation, which is still lower than that of untyped LC. Similarly, I made lenient assumptions about untyped LC (variable shadowing). Without it, you'd need to keep track of every variable that's in scope, so as not to accidentally bind it again. With it, evaluation becomes complex by a bigger factor than a TM becomes under my zero-cost representation.
It's important to understand that encoding is always a tricky subject, because every different encoding requires a different interpreter, and one could ask whether that interpreter is really the "original" TM or "original" LC (of course, there's no argument over huge complexity differences, but slight ones are up for some debate). I think that under very natural, reasonable encodings, the machine models require zero validation; I also think that allowing variable shadowing in LC is reasonable and doesn't completely change the model. It's perfectly OK to disagree with me on this point, but it really doesn't make a difference.
As to providing more detail in the post, well, I don't want to get lost in details that are ultimately inconsequential, and no matter what level of detail you choose, there will always be those who don't find it detailed enough. I prefer the interactive approach of an online discussion.