Hacker News new | past | comments | ask | show | jobs | submit login
The halting problem is decidable on a set of asymptotic probability one (2006) (projecteuclid.org)
152 points by Schiphol on May 28, 2023 | hide | past | favorite | 120 comments



A few years ago I had an idea to try and solve halting problem for as many Brainfuck problems as possible, using a few simple heuristics. (To make things easier I picked a dialect with 1 byte cells that weren't allowed to over- or underflow.)

I managed to solve all BF programs up to length 12. The longest running but stopping short program that I've found was this: >+[>++>+++[-<]>>], it stopped after 9212 steps.

https://github.com/eterevsky/beaver


> I managed to solve all BF programs up to length 12.

With 8 possible instructions, that is equivalent to all programs up to 12 * log 8 = 36 bits. Curiously, that exactly matches how far we were able to analyze the behaviour of a functional busy beaver [1].

> it stopped after 9212 steps.

The largest output of a 36 bit lambda term is a whopping 578960446186580977117854925043439539266349923328202820197287920039565648199686 bits long though...

[1] https://oeis.org/A333479


From what I recall, the "undecidable" part of the halting problem isn't determining that something halts, but determining that it _doesn't_ halt (i.e. the halting problem is verifiable, as opposed to the inverse "non-halting" problem, which isn't verifiable, which is always the case for the inverse of a verifiable problem).\

I've never properly learned brainfuck, so the answer to this might be trivial without needing to write a solver, but what's the shortest non-halting program you found?


The shortest non-halting brainfuck program will have exactly three instructions. Here is one them: `+[]` here is another one: `-[]`.

Brainfuck's state consists of an array of integers, an index into that array, a program counter, and some bookkeeping required to track which "[" corresponds to which "]".

Let's assume the array a[:] is zeroed and the index i is set to 0 before the program begins executing.

Then `+` increments the cell in the array a at index i, i.e. `a[i] += 1`; `[` begins a while loop, equivalent to `while a[i] != 0 {`; and `]` ends the while loop: `}` .

So the program `+[]` is equivalent to something like

  a := (allocate a length N array of bytes, set to zero)
  i := 0
  a[i] += 1
  while a[i] != 0 {
  }
which will not terminate.

Why is this the shortest?

`]` is the only instruction that might decrease the program counter, otherwise the program counter always increases by one or more. Therefore, a non-halting brainfuck program must include at least one `]`. Brainfuck programs are syntactically incorrect unless there is exactly one `]` occurring somewhere in the program after each `[`. So the shortest non-halting brainfuck program must have at least two instructions.

There is only one syntactically valid brainfuck program with two instructions that might not halt: `[]`. But this will halt, assuming that the array `a` is initialised to zero before execution begins. So the shortest non-halting brainfuck program contains three instructions.


To add to this, there is also the `,[]` program, which reads a single input byte, halts iff it is zero, and loops infinitely otherwise.


Fair point! Arguably the program `,` might not halt if a user never provides a byte of input to read! I've never seen a brainfuck machine with a read timeout.


Actually, it's possible to determine if any machine with a finite (bounded) number of states (ie, finite memory) halts given a finite (bounded) input. You just enumerate all the states. If for some input there is a sequence of states that loops without consuming input, it's infinite, otherwise not.

In practice it may not there may not be enough resources to build something to enumerate all the states on all inputs, but we are talking theoretically here. Theoretically, there is no halting problem in a finite world. And we humans live in what looks to be a locally finite world, so it doesn't apply to us.

The halting problem is actually a statement on whether it's possible to always determine if a finite set of rules that have infinite memory will halt on all inputs. The answer is famously no. Thus the halting problem illustrates a property of infinities. It's a subtlety that's rarely mentioned.

It is an interesting property. One thing it says to me is there are things in an infinite world a finite computation machine (like our human brains) will never be able to understand. While locally (say within the Milky Way) we live in a finite world the Universe itself may well be infinite. Ergo, there may be some aspects of the Universe we can never understand, and some things may happen that we can never predict.


It’s not so simple, even finite halting code has to handle the self recursion reversal trick somehow. Aka the finite bounds must exclude the “if H(X,I) loop forever; else halt” code because H would still be broken.

You could try defining H to only work on input machines smaller than itself. However there’s still problems.

If you don’t limit input size (aka the I in H(M,I) does machine M halt on input I), your code would still get tricked if M is just a tiny emulator and the input is the whole self defeating H program. So M and I both must be limited.

Anyway that’s not to say a finite H would work even if the input size is limited. You’d have to prove it’s impossible to self recurse - so you’d need good compression.

That’s all assuming self recursion is the only problem with the halt programs. Maybe it’s not, and it seems difficult to rule out other problems and see how they’d affect finitely bound TMs.

I’m just spitballing here. For a while I was infatuated by the idea of snuffing out self recursion or otherwise somehow making a variant of halting programs possible. But self referential poison seems really, really tough to prove the non existence of.

Anyways, I’m of the attitude that the proof for halting problem undecidability is a dumb trick. I’m simply fine with a halting program that doesn’t work correctly in self referential cases. As above though, that just leaves the whole problem vague aka don’t know about other issues and there’s the problem of actually solving it anyways (aka solving all of mathematics at once).


Let's take the _complete_ program state (input, output, RAM, program counter, stack, registers, rand gen entropy source state, etc) and for every possible such state make a node in a graph. Then when we execute a given program for a given state, we get one edge to the next program state node. Then deciding if program halts or not is answering a question if graph has a node with no outgoing edges. It's also possible to build such graphs for programs that never halt. That's the basic idea behind TLA+.

So it's definitely possible to decide halt problem for a bounded computer (any physical computer), it's just not practical in most cases, e.g. amount of memory needed to store graph of all states be bigger than amount of matter in the universe, etc.


I don’t think that works for the self recursion trick. It’s a logical problem more than a representation problem. Even if you had a machine to decide halting status quickly, it’d be broken; representing computation as a graph can’t help.

The inputs to a halting decider TM is an infinite set (all TMs and all inputs to them) so that’s another reason why a finite graph can’t be represented. Even something like integer addition isn’t a finite graph if you enumerate all inputs


Self recursion is not solvable if assuming stack is unlimited, which is never a case in reality. Stack on real computers is usually limited to a fixed size (100-1000kb). Another way to look at it, take all possible bits on a real computer, so all of RAM, all of storage, etc and then iterate over all possible values of it, it will be 2^bits possible values, sure it's a large value but it's still bounded. So it's possible to iterate over all possible states of any program on bounded computer in a bounded time. Hence why it's possible to decide if any program halts on any real computer in bounded time.

Sure one can say input is coming from a network stream then what? For practical issues then we can limit input size to N bits ... For theoretical issues we can discuss if there is a difference between halting or waiting for more input :)


> I've never properly learned brainfuck, so the answer to this might be trivial without needing to write a solver, but what's the shortest non-halting program you found?

It's a pretty trivial question in any language, it tends to just be while(true); or similar in that language.


If we're being pedantic, it's pretty trivial in any Turing complete language (which I imagine Brainfuck is in, since I think it's supposed to be essentially just syntax over the idea of a Turing machine); there are definitely languages that aren't able to express non-halting programs by design (like Coq, which I think is primitive recursive).


What is it in the lambda calculus?


(Lx.xx)(Lx.xx) is the most famous/well-known one.


And as it turns out, the smallest one too.

For combinatory logic with the usual {S,K} basis, it's S S K (S (S S K).


What does 'lambda calculus' mean? I read that lambda refers to binding to variables - so is 'calculus' really just a way to refer to calculating generally? I suppose I only know the word contextually from the math class that shares the name.


Someone else gave a very good definition, but I'll add that historically, the Lambda Calculus was a system designed by Alonzo Church to get a formal notion of what "computation" means in mathematics. As in, a formally-defined set of rules for how a computation can be represented, and then be mechanically carried out to achieve a result. The need for such a system came about because of mathematicians wanting to define what it means for a function to be computable in a formal sense, and to be able to define and prove other theorems about computation. The other famous model of computation, which was invented at about the same time, was Alan Turing's "Turing machine", which he used to prove that the halting problem was not computable.

Just for a short taste, in the lambda calculus the base entity is an abstract function that can be applied to a variable and return a result. You are probably familiar with formulae like `f(x) = x`. In the lambda calculus formalism, this is written as `\lambda x.x`. However, it's important to note that there is no direct equivalent to something like `f(x) = x + 1` - addition or numbers are not a part of the lambda calculus directly. Instead, you can define numbers and then addition in terms of functions and variables.

Instead, the lambda calculus has rules for how function application should work to reduce complex formulae to simpler ones. For example, `(\lambda x.x) y` reduces to `y` [equivalent to f(x) = x; f(y) = ? => y]. Or, `(\lambda x.y) z` reduces to `y` [equivalent to f(x) = y; f(z) = ? => y]. And, if we go on to more complex expressions: `(\lambda x.x) (\lambda x.y) z` reduces to y by applying the reductions above successively (first we reduce the initial expression to `(\lambda x.x) y` then we reduce this to just `y`.

It can be shown that this model of how to reduce the formulae of lambda calculus is in fact equivalent to the model of the Turing machine that can write a symbol on its tape.


"Lambda calculus is a notation for describing mathematical functions and programs. It is a mathematical system for studying the interaction of functional abstraction and functional application. It captures some of the essential, common features of a wide variety of programming languages."

- https://www.cs.cornell.edu/courses/cs3110/2008fa/recitations...


And the name of this very website we are talking on is derived from lambda calculus. The y combinator is a fixed point combinator in lambda calculus.


The meat of the paper is this short

> Proof of the Main Theorem Using the standard model

> let B be the set of programs that on input 0 either halt before repeating a state or fall off the tape before repeating a state. Clearly, B is polynomial time decidable, since we need only run a program p for at most n steps, where n is the number of states in p, to determine whether or not it is in B. It is equally clear that the halting problem is polynomial time decidable for programs p in B, since again we need only simulate p for n steps to know whether it halted or fell off. What remains is to prove that this behavior occurs with asymptotic probability one.

The proof also makes use of this

> Lemma 2.5 (Polya [2], see also, e.g., [1])

> In the random walk with equal likelihood of moving left or right on a semi-infinite tape, beginning on the left-most cell, the probability of eventually falling off the left edge is 1.

This suggests that the result crucially depends on having a one-sided infinite tape rather than the two-sided infinite tape that's used in the Turing Machine busy beaver.

It would be interesting to determine whether the result also holds for that other computational model, i.e. what about the set of lambda calculus terms with or without a normal form?


I got confused here:

> Clearly, B is polynomial time decidable, since we need only run a program p for at most n steps, where n is the number of states in p, to determine whether or not it is in B.

Couldn't program p have an exponential number of states, e.g. 2^|p| distinct program states? Then running the program for O(2^|p|) steps is clearly not polynomial time decidable in the size of the instance. I thought polynomial time, for the decision algorithm, meant O(|p|^c) for some c. Am I mistaken about something?


In this case p is the standard encoding of a Turing Machine with n states, so |p| = n * 2 * (1 + 1 + log(n+1)), so necessarily n < |p|.


Yes thanks I figured that out later; they define the "state" as what in CS we would call the lines of code or control flow blocks. What I'm curious about is with our usual "modern programmer's" understanding of a state, the state space is exponential in the size of the program. That's why we have model checking and software verification. Whereas this theorem is about programs B whose state space is linear in the length of the encoding of the program, because they're allowed to visit each Turing machine state at most once before halting. At first glance seems like a very specialized case of a class of programs to talk about.


State space refers to the number of states of not just the program counter, but more importantly, the data operated on by that program, which in case of TMs is the tape contents.

This theorem is not about programs B whose tape contents is linear in the length of the encoding of the program.


The criterion is that p in B must halt before returning to a Program Counter value (what they define as repeating a Turing machine "state"). That's the length of the program text. So, I have no idea why they are talking about polynomials, it looks linear because the program counter has 1 value per line of code in a program text. If not, I would like to know where my misunderstanding could be. Clearly the authors think the paragraph is a simple, obvious step not worth discussing clearly (they say something about running the program for n steps being sufficient, but they do not outright, explicitly say why it would be polynomial, etc.)


Oh, you're right that the criterion for B seems to ignore tape contents. I agree that B is a very uninteresting subset of programs from a programming perspective...


Similarly, as the number of particles in a system increases, the less you have to worry about Heisenberg uncertainty bounds on your momentum/position measurements.

It’s a misreading of the halting problem to think of it as a ‘barrier’ of some sort. What it really means is that there are some computer programs where the only way to find out what they will do is to actually run them. There’s not a ‘better trick’ you can pull to get the answer. This is kind of intuitive, since we know there are problems we run into where the easiest way we can think of to solve them is to write a program and run it. The halting problem tells us that it’s plausible that that instinct is right - that for some programs, you really can’t do some other analysis of the program that tells you what it’s going to do. You have to just run it.

Like: take a random complex number with magnitude < 2. Keep squaring it and adding the original number. Will you ever get a number with magnitude > 2?

For some numbers you can take a shortcut and say yes or no. But for others the only way to find out is to keep doing it.


> The halting problem tells us that it’s plausible that that instinct is right - that for some programs, you really can’t do some other analysis of the program that tells you what it’s going to do. You have to just run it.

This isn't quite right. The halting problem says that you can't just run it. If you just run it, you will not learn whether or not it halts after any finite amount of time.

Also I don't see how this is related to uncertainty bounds.


> If you just run it, you will not learn whether or not it halts after any finite amount of time.

If it halts within a finite amount of time, you will.


No you won't. Pick any finite time you will check. It may not stop by then, yet may stop later. So ahead of time, even if the program will stop, you cannot make a plan to learn that in finite time. You're stuck letting run possibly forever, because at each finite time you still cannot answer yes or no. Only if you've waited long enough, which can be arbitrarily far out, and it stops, can you answer.


> Only if you've waited long enough, which can be arbitrarily far out, and it stops, can you answer

So, we agree!


But that is not the only possibility. So you are missing the point.


They are not missing the point. The previous post was simply wrong. The halting problem is semi-decidable. So you can potentially find out wether a program halts.


Besides missing the point, this is also pedantic and quite boring.


Well, it depends how you take it. The fact that the halting problem is undecidable tells us that there is no algorithm to tell if an algorithm will ever halt for some input. This implies that the best we can do if we want to find out if an arbitrary algorithm A will halt for some input B is to run A with input B. We may still never find out, but we also have no better way.


I think you’re confusing computational irreducibility and the halting problem.


Computational irreducibility is one consequence of the halting problem being undecidable. If the halting problem were decidable, there couldn't be any computationally irreducible algorithms.


Right, these things are all corollaries of one another.


I think he might be referring to Rice's Theorem, which to be fair is reducible to the halting problem


> Similarly, as the number of particles in a system increases, the less you have to worry about Heisenberg uncertainty bounds on your momentum/position measurements.

I don't think this is correct. In fact, in certain systems, the Heisenberg uncertainty can get amplified as the number of particles increases - which is why we end up needing the addition of the Born rule in QM to predict classical observations, and can't just deduce them from the base equations. And also why we end up predicting a particle can go either left with some probability or right with one minus that probability, instead of being able to say "that particle will go right".


Yet you can generally catch a thrown ball.

The point of the analogy was that the halting problem - like the uncertainty principle - is a result that says that you can’t know certain things about systems - and with small systems in particular figuring that out can be as complex as the system itself. But they don’t prevent us from doing useful things with larger systems. It struck me as a similar result to this paper - though only by analogy, not like, in some deep way.


My point was more that Heisenberg uncertainty doesn't simply go away because of the size of the system, it is explicitly not an element of classical physics. If you apply the equations of QM to a system of any size, you'll see compounding uncertainty, not an attenuation.

The fact that classical objects have deterministic trajectories is not a consequence of statistics + Heisenberg uncertainty, it is explicitly in contradiction to Heisenberg uncertainty, requiring an extra element to be added to the theory (the Born rule).


What does this have to do with Heisenberg uncertainty?


> Similarly, as the number of particles in a system increases, the less you have to worry about Heisenberg uncertainty bounds on your momentum/position measurements.

This certainly makes intuitive sense, since it explains why we don't have trouble measuring both the speed and position of objects in everyday life with fairly high precision.


Unexpected Mandelbrot


I’ve never seen “asymptomatic density” referred to as “asymptomatic probability.”

Anyways, if anyone is interested is a more complete survey of similarly flavored computability/density results, this paper is good: https://faculty.math.illinois.edu/~jockusch/DensityCE_JML.pd...

There is even some pure combinatorics involved, for example I remember lemma 6.7 from my combinatorics class back in undergrad


Note: it's not asymptomatic, but asymptotic - without the ma. I can only wonder whether you've always been writing it like that, or if this is the toll that the pandemic had on you.


I think my autocorrect got the better of me there!


Maybe they just need a vaccination… uh I mean a vacation


In case anyone wants a quick tl;dr: as the length of a program (expressed as the number of states in a Turing machine) grows, the relative fraction of undecidable programs decreases. In other words, "most" (approaching "nearly all" as they get longer) Turing machines, out of the set of all possible Turing machines of that same size, are actually decidable. (Though this doesn't tell us about whether most real programs are decidable, as the programs humans write also come from a small subset of the universe of Turing machines. But we know that in practice, this is also true).

This relates nicely to the earlier discussion of SAT: while SAT is NP-complete, many instances can be solved in reasonable time. Similarly, we can verify that many actual programs will halt even if the problem is, in general, undecidable.


I think it would be pretty shocking if the opposite were true. I imagine most randomly generated turing machines are pretty trivial.


Dunno. The real numbers, for example, are in the “most things are eldritch abominations” camp[1]: the reals are uncountable, the computable reals are a(n everywhere dense) countable set[2] (that is therefore[3]) of measure zero. Not that there’s a reason to expect these cases to be similar, just that weird things do happen.

[1] https://mathwithbaddrawings.com/2016/12/28/why-the-number-li...

[2] There’s a finite set of programs of a fixed length, thus (countable union of finite sets) a countable set of all programs, thus only at most a countable set of numbers they can compute.

[3] Every countable set has Lebesgue measure zero: cover point #n by a blot of measure ε / 2^n (that could even intersect other blots) to prove all of them have measure at most ε > 0, take ε as small as desired.


I've made the argument many times before that the reals are certainly an interesting set of objects, but a poor abstraction for the concept of a number. We should work towards replacing them with a more well-behaved set without eldritch monsters such as uncomputable 'numbers' that you can't add or multiply with. E.g. https://en.wikipedia.org/wiki/Computable_number.


Computable functions do also have some weird behavior. For example: a) all functions are continuous, b) equality is not computable (maybe not so weird for anyone used to floating point), c) differentiation is not computable unless you work over complex numbers (but integration is!)


a) All functions from computable reals to computable reals are continuous. This makes sense, because an arbitrarily small input difference leading to an arbitrarily large output difference is not a good basis for computation. You can still have discontinuities for discrete computable maps.

b) Neither is equality computable for the reals. Nothing is lost here.

c) Differentiation is computable for the functions of computable reals if you have a computable rate of convergence.


How to do (c)? I tried to model the reals as lazily computed sequences of intervals with rational bounds where every next interval is subset of previous one, and couldn't write a higher-order differentiator function because on every step for a given input interval there is output interval with non-zero (in most cases) size and the differentiated function may contain quick enough oscillations that the derivative is unbounded, and thus the derivative will stay unbounded no matter what you do. What do you mean by rate of convergence?



(I assume you refer to the sequence of reals that is used to compute the limit in the definition of derivative) But is there a way to somehow construct this modulus automatically for some functions? And if there isn't, then isn't requiring manual input of the modulus like just requiring manual input of the derivative itself? Can you typically find the modulus manually more easily than the derivative?


> But is there a way to somehow construct this modulus automatically for some functions?

In a sense. You would end up with https://en.wikipedia.org/wiki/Automatic_differentiation.

The amount of functions you can construct in this manner is massive, along with their derivatives, but it is not an operator that given an arbitrary computable function gives you its derivative.


Thank you.


There’s a tradeoff[1] throughout mathematics, not unlike the expressiveness—analyzability one in programming languages: either the set of things you’re working with has nice properties (e.g., for the reals, completeness) and supports nice constructions with no caveats (e.g., for the reals, limits); or each of the things in the set is individually nice and tractable (e.g. computable).

There isn’t a single universally appropriate choice here, and sometimes even the direction of the axis is not obvious (real numbers lack solutions for algebraic equations; complex numbers lack smooth functions nonzero only in a finite region), but usually a simple family of eldritch objects is easier to deal with than an byzantine clan of cuddly objects.

Could you do simple physics and more generally ordinary differential equations over the computable numbers? You probably could to some extent, but it’s unlikely you’d get the nice intuitive geometry you get over the reals (see e.g. Arnold’s ODE textbook). The (basic) geometry of surfaces parametrized by real coordinates is fairly straightforward (differential geometry); the geometry of surfaces parametrized by rational coordinates is famously mind-blowingly abstract (algebraic geometry). The (basic) theory of real or complex functions is mostly understandable (analysis); the theory of natural or rational functions is horrifying (number theory).

Also, as a sibling reply mentions, it’s not like computable numbers will solve every problem: even though arithmetic over them is decidable, order still isn’t, and computable transcendental functions aren’t exactly a cakewalk either.

It doesn’t mean the computable reals aren’t worth knowing about, but, well, we live in a world where mandatory education has violently rejected[2] every advance in mathematical thought since Otto von Bismarck left office, and reduced the ones it accepted to a desiccated husk[3], so there are a great many things in that category. (I cannot help but remember the SMBC patriotism vs nationalism comic[4]: “Now I am mathematics.”)

[1] https://ncatlab.org/nlab/show/dichotomy+between+nice+objects...

[2] http://www.ams.org/notices/201201/rtx120100031p.pdf

[3] https://www.maa.org/external_archive/devlin/devlin_03_08.htm...

[4] https://www.smbc-comics.com/comic/an-important-distinction


The notion of uncountable set assumes certain axioms and classical logic. If instead one assumes rules of constructive mathematics, then one cannot construct uncountable set. Simplifying only computable numbers can exist.


Note that there are multiple versions[1] of “constructive mathematics”, so for example what Bishop’s constructive real analysis book considers constructive is not what Markov’s “constructive = computable” tradition does.

You are also going to encounter other funny things. One of De Morgan’s laws failing could seem troubling but ultimately theoretical—how, then, about the distinction about countable and constructively countable[3]? The set of all programs is countable; the set of all terminating programs is a subset of it, but can’t constructively be proven to itself be countable—after all, that would imply a solution to the halting problem! (Neither can it constructively be proven to not be such, as it is such in a consistent extension, namely classical mathematics.) Thus you get the new and exciting notion of subcountable sets. And that’s one of the mildest things that you’re going to encounter in your shiny new theory of cardinals.

(Anybody who can explain to me what the hell constructive Hahn—Banach[4] is about gets a beverage of their choice if we are ever in the same city. Without HB, you cannot do any functional analysis worth a damn, and without that you can’t really talk about Green’s functions / fundamental solutions to linear PDEs, which are in turn a basic tool for any electodynamics or quantum mechanics worth speaking about. Unless you want to forgo proof or the natural sciences, you need it, is my point.)

None of this is to say that I think intuitionistic logic is useless, mind you. It’s interesting and useful[5]! But so far it seems that you need to learn at least some logic (as a piece of mathematics and not mere terminology, so including scary words like “metalogic”) before you can really understand how it comes together.

[1] It seems categorical logic[2] says useful things about what the One True Definition should be? But I don’t understand it well enough to tell, and in any case the constructivist tradition predates not only it but category theory in general.

[2] https://mikeshulman.github.io/catlog/catlog.pdf

[3] https://ncatlab.org/nlab/show/constructive+mathematics

[4] https://www.cse.chalmers.se/~coquand/hahn1.pdf

[5] https://www.ams.org/journals/bull/2017-54-03/S0273-0979-2016...


Markov’s variety of the constructive mathematics is just what Bishop did plus the assumption that Church-Markov-Turing thesis holds. One does not need to assume that to recover any result of classical mathematics applicable in the real world. Also all those weird results of Markov’s approach that apparently “contradict” the classical mathematics is a consequence of Church thesis. Without that one cannot prove them. So Bishop’s approach is always compatible with classical mathematics.

As for Hahn—Banach I can only site the book “Techniques of Constructive Analysis” by Douglas S. Bridges:

For example, our proof of the Hahn-Banach theorem (Theorem 5.3.3) is, as it stands, a valid algorithmic proof of the classical Hahn Banach theorem. Moreover and this is one advantage of a constructive proof in general our proof embodies an algorithm for the construction of the functional whose existence is stated in the theorem. This algorithm can be extracted from the proof, and, as an undeserved bonus, the proof itself demonstrates that the algorithm is correct or, in computer science parlance, "meets its specifications". …

The Hahn-Banach theorem enables us to extend a normed linear functional, with an arbitrarily small increase in norm, from a subspace of a normed space to the entire space. This fundamental result has numerous applications throughout functional analysis.

In the constructive context we deal only with the extension of linear functionals on subspaces of a separable normed space. The standard classical proofs extending the theorem to nonseparable normed spaces depend on Zorn's lemma and are therefore nonconstructive. … The classical Hahn-Banach theorem says that we can extend a bounded linear functional v from Y to a functional u on the whole space X with exact preservation of norm. In general, as Exercise 5 shows, we cannot do this constructively. But, as we shall see, if we impose extra conditions on the norm of X, then we can make the extension norm-preserving.


> the distinction about countable and constructively countable? The set of all programs is countable; the set of all terminating programs is a subset of it, but can’t constructively be proven to itself be countable—after all, that would imply a solution to the halting problem!

I thought this had to do with the difference between being computable and computably enumerable.


tl;dr : Maybe the reals are far more mysterious. All we ever have handle over are the countable sets.

[0] Looking back, the core of my confusion was this. I had thought: I can visualize what ℵ0 means; that’s just the infinity of integers. I can also visualize what C=2^ℵ0 means; that’s the infinity of points on a line. Those, therefore, are the two bedrocks of clarity in this discussion. By contrast, I can’t visualize a set of intermediate cardinality between ℵ0 and C. The intermediate infinity, being weird and ghostlike, is the one that shouldn’t exist unless we deliberately “force” it to.

Turns out I had things backwards. For starters, I can’t visualize the uncountable infinity of real numbers. I might think I’m visualizing the real line—it’s solid, it’s black, it’s got little points everywhere—but how can I be sure that I’m not merely visualizing the ℵ0 rationals, or (say) the computable or definable reals, which include all the ones that arise in ordinary math?

The continuum C is not at all the bedrock of clarity that I’d thought it was. Unlike its junior partner ℵ0, the continuum is adjustable, changeable—and we will change it when we build different models of ZFC. What’s (relatively) more “fixed” in this game is something that I, like many non-experts, had always given short shrift to: Cantor’s sequence of Alephs ℵ0, ℵ1, ℵ2, etc.

[0] https://scottaaronson.blog/?p=4974


My intuition tells me the opposite.

Suppose you have one undecidable statement, U, out of s possible statements. A random program of length n contains U at least once with probability 1-(1-1/s)^n. If the program is of infinite length, it contains U with probability 1. So I would have thought the chances of undecidability increase as the program gets longer.

Clearly my intuition is wrong, per the paper, but I don't think the result is immediately obvious.


Think about it like this, there are infinitely many Turing machine state classes that are easily decidable. For example, a simple infinite loop or its equivalent. Most of these classes are also very trivial to construct, and require only a few "instructions".

So if you make a random TM, it's almost guaranteed that it'll get eventually "trapped" in one of these easily decidable infinite loops.


Thanks, that's a perfect explanation


It entirely depends on the description language. In Python, the opposite is true for basically the reason you gave: an asymptotically positive constant fraction of valid Python programs will begin with code searching for a contradiction to ZFC followed by an exit statement. In the model where Turing machines are modeled by random graphs with out-degree two, it is no longer necessarily the case that the start state component is exactly some fixed undecidable subgraph; i.e. as more states are added the probability of finding the start state inside a particular induced subgraph shrinks to zero.


I think the chance of having an undecidable (group of) statements go up, but the chance of executing it does not.

There isn't really a single undecidable statement, it would be a complex subroutine. My intuition would be its much more likely a halt or indef loop (both relatively simple) would appear before program control would transfer to the undecidable part, in most cases.

I.e. if something undecidable requires 60 symbols in a row in correct order, but halting requires one, than probability of a halt at any given spot seems much higher on average.


A statement in a program can’t really be undecidable on its own. It’s the program that has that property.


A statement is just a subprogram. Whether it can be decidable depends on whether a single program can be undecidable (and I can't see how that could be possible, undecidability is property of the problem - decision procedure on set of programs)


The difference with SAT is that for SAT we actually know the opposite is true - the instances that can be solved quickly are the "rare" kind.


I’m not sure the comparison to SAT is a good one because, in that case, we know of particular “phase transition” regions in which most problems aren’t clearly satisfiable or unsatisfiable. (Specifically, when the ratio of clauses to variables in 3SAT is about 4.2.)


It's an excellent comparison to SAT, because in both cases nobody cares about random instances at all.

For non-random instances the ratio of clauses to variables in is utterly meaningless.


Also the same applies with the CAP theorem… it is mostly a theoretical result but in practice things go well the vast majority of the time!

Also look up Buridan’s Principle by Leslie Lamport. I emailed him and “argued” with him for a while after reading his paper, but in a theoretical sense and some practical subsets it is correct!


Interesting paper. I agree that he is probably technically correct, but I think he downplays the fact that all over physics people say "never" when they mean "with probably less than 10^-1000.

For example, in classical thermodynamics there is a nonzero probability that all the air molecules in a room will happen to travel to one side, violating many laws of fluid mechanics. In quantum mechanics the situation is even worse, where basically anything can happen with nonzero probability.


> basically anything can happen with nonzero probability.

It took me some time to realize that that's what "HitchHiker's Guide to Galaxy" was about.


The simplex algorithm is similar being exponential time worst case but in practice often closer to linear


It's an interesting result, but the probability distribution seems to be doing a lot of the work. The distribution of programs a human is likely to encounter is very different from the set of all programs of length N uniformly distributed.


This contradicts Murphy’s theorem, that if something can go bad it will go bad with asymptotic probability one.


I would say it’s consistent since something going bad can often mean halting


No, Murphy says that despite having a vanishingly small probability, somebody will toss an undecidable problem into your non-halting oracle.


Well, if the program is operating a business-critical function, then having it halt is bad.


Is there a contest somewhere for the longest continuously-running computation?

It would be interesting to see how long we are capable of running even just a while true

The power might go out, some electronic component might fail, or the cooling system, etc

What is the longest that we are actually capable of keeping the most simple system running?


Does the century old continuously running lamp bulb in a fire brigade count as a computation, if a while true does?


Life on Earth is an out-of-equilibrium process that has been running for ~3.5 billion years.


Probably Voyager 2. Running since August 20, 1977 - never been rebooted (because good luck doing that)


Maybe there's a clock on Earth with a microcontroller and some kind of UPS or battery backup or large capacitor backup?

You could argue that the timekeeping of TAI (https://en.wikipedia.org/wiki/International_Atomic_Time) could be seen as a distributed computation (where different entities are counting the same value and running manual and automated protocols to keep the value synchronized), and that has been running since 1972. It almost certainly hasn't been running continuously on any individual hardware device, but there have always been devices that represented its value and always been efforts to maintain continuity between calculations.

Although I guess the same logic could argue for civil timekeeping in any calendar, because people have actively tracked it as "the same computation" and cared about it for centuries or millennia. Maybe the distinctive argument for TAI is that it's presumably been continually represented digitally in programmable computers since 1972.


I think it would make more sense to measure the longest computation in the number of cycles executed rather than in seconds. If I'm not mistaken, Voyager 2 had a processor running at 4MHz. So a modern 2 GHz processor will execute more cycles in a couple months than a 4MHz processor in 50 years...


The Oxford bell is running since 1840.


I had an insight when learning about the Halting Problem. There is a class of programs for which the Halting Problem is solvable since you can use circuit analysis to see if well-formed loops halt. We should strive to write programs in that class since then we know that the execution, long as it may be, will yield a result.


The problem is that with probability 1 a random problem will access an undefined memory position. Such program halt by definition.

This makes the result a lot less interesting. Sure random programs will typically halt before long, but with an illegal memory access ( in the parlance of the article, they read from the left of the turing tape)


This is wrong and directly contradicted, unambiguously by the linked paper.

"For the purposes of defining the halting problem H, one should specify whether it officially counts as halting or not if the head should happen to fall off the left edge of the tape. Although the truth of the main theorem will not depend on these details, provided we adopt a uniform answer, let us be definite and regard such computations as having not officially halted, as the halt state was not reached."


There is no such notion of “undefined memory region” for a Turing Machine


Some models of the Turing machine have an infinite tape in only one direction. Some also don’t require every state to have an exhaustive set of transitions. Those models generally consider going off the end of the tape or failing to find a transition to be a halting state.


The paper makes explicit how this is dealt with:

> For the purposes of defining the halting problem H, one should specify whether it officially counts as halting or not if the head should happen to fall off the left edge of the tape. Although the truth of the main theorem will not depend on these details, provided we adopt a uniform answer, let us be definite and regard such computations as having not officially halted, as the halt state was not reached.


It’s unclear to me why restricting ourselves to that model is useful here


Changing the model wouldn't change anything. A turing machine is equivalent to any computational machine you want to put in there. It doesn't matter what the machine is. The turing machine is the goto because it was first used for them.

Any other machine that can simulate a turing machine can simulate failing in those same ways. You can't escape the problems of turing completeness without losing it.


It does matter, the second and last sentence of the abstract is "The proof is sensitive to the particular computational models."


It’s the simplest model which can simulate all useful models of computation we have so far.

So, if you want to prove something about the limits of computation, a Turing machine is usually the right choice.


I think interaction nets [0] is actually a simpler model and can simulate Turing Machine efficiently. I wish courses in Computability/Complexity theory would be taught in interaction nets instead of Turing Machines as the program written would be so much nicer and compose easily. It also has real-life uses in compiling functional languages. [1]

[0]: https://en.wikipedia.org/wiki/Interaction_nets [1]: https://github.com/HigherOrderCO/HVM


How is this simpler?

It seems to me a more complex lambda calculus.


Symmetric interaction combinators (an instance of interaction nets like 2-state 3-symbol Turing machine is a particular type of Turing Machine) only has 3 agents and 3 rewriting rules. This makes it about as simple as lambda calculus with 3 possible terms (variable, application, and abstraction) and α-equivalence, β-reduction, and η-reduction. The advantage of interaction nets is that each rewriting rule is local and "atomic" unlike subsitution step in β-reduction which can take arbitrarily long time as it is defined recursively.


I was referring to the instantiation of a TM as having a one sided tape.


It is true that some models define halting that way, but in this paper they explicitly do not.


No one talks about the fact that the halting problem has been solved for linear bounded automata.


Because it's immediately obvious that for anything with a finite number of states determining if it runs forever is "trivial" as running it until it either repeats a state or halts which will always occur in finite time.


Uh, yeah, but my PC with 96GB of memory has quite some possible states.

So, most likely, the real reason nobody is interested is that the numbers are too large.


Well the trick is knowing which states to keep track of.


If your computer had 4k of memory, it'd be too many states to keep track of


Right, why are there so few total languages then?


BRB, implementing a x86 interpreter with O(2^n) memory overhead.


That's the brute force approach.


I guess the alternative is just bounding the number of instructions and proving inside or outside your language that it is sufficient for your program.


Indeed, that's what bounded means.


Yes, but the point is that you are giving a constructive existence proof for that bound. You asked for why there are so few total languages. The answer, as far as I'm aware, is either brute force on a fixed amount of memory or giving provable bounds either by decision procedure on a restricted language or having to manually give a proof on a more expressive language. Is that that an answer to your (presumably rhetorical) question?


When you say no one talks about it, isn't that what Turing means when he says:

    The fallacy in this argument lies in the assumption that Beta is computable. It would be true if we could enumerate the computable sequences by finite means, but the problem of enumerating computable sequences is equivalent to the problem of finding out whether a given number is the D.N of a circle-free machine, and we have no general process for doing this in a finite number of steps.[1]
In an LBA can't we always enumerate the computable sequences by finite means?

[1] On computable numbers https://www.cs.virginia.edu/~robins/Turing_Paper_1936.pdf




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: