If I were to design a processor, it would essentially execute programs written i...

mikeash · on April 16, 2013

Turing-complete but with clean subsets seems completely different from not using a Turing-complete input language at all.

DanWaterworth · on April 16, 2013

In order to execute a total functional language, you have to basically split the processor into two. I'll call these two parts the evaluator and the executor.

* It's the evaluator's job to evaluate expressions. Expressions are primitive recursive so they always normalize.

* The executor's job is to talk to the outside world and to provide data to the evaluator and receive commands from the evaluator. It also can't get stuck in an infinite loop; it must eventually provide data to the evaluator after receiving a command.

The only way an infinite loop can happen is if the evaluator and the executor get stuck together.

When I said processor before, I was referring to the part that did the processing: i.e. the evaluator.

cpressey · on April 16, 2013

I'm interested in approaches like this too, but there are some practical considerations that seem to complicate it:

* from a systems point of view, an infinite loop is no worse than a routine that doesn't return after a reasonable amount of time. PR is a huge complexity class (it contains NEXP); just because a function always terminates doesn't mean it's efficient (or as efficient as you would like/need it to be.)

* PR functions are somewhat easier to reason about than general recursive functions (no need to mess around with bottom and domain theory) but I haven't seen a lot of evidence that that translates to making them easier to aggressively optimize.

* In fact, I have heard many PR functions have a more efficient GR equivalent, and I don't know of any way to automatically derive the GR version from the PR; and I expect (just on hunch) that no perfectly general PR->GR conversion could exist.

* Granted, a function can be shown to be total without necessarily being PR, but then you have the burden of proving that it is total, and it seems inelegant to move that burden to hardware. Maybe it's not, maybe that's just "normal science" talking.

* In practice, if I want to run an interpreter for an existing TC programming language on this architecture, it has to treat the architecture as TC (i.e. conceptually break its separation between executor and evaluator) anyway.

DanWaterworth · on April 16, 2013

You make some good points. Especially that a GR function may be able to work more efficiently than a PR function; do you have an example or can you cite a reference for this?

As for optimization, I believe that it may be possible to more effectively automatically parallelize a PR function than a GR one.

cpressey · on April 16, 2013

Unfortunately, it's hearsay to me; it came up in conversation (about computability and complexity) with a professor, and I took his word for it at the time; I've been meaning to ask him for a reference ever since, but never got around to it.

But at least one thing I can see is that in PR, you need to fix the upper bounds of the loops (/recursion depth) ahead of time, not "on the fly". If you're doing some kind of iterative approximation, you probably don't know what those bounds "should" be, because you're going to terminate on some other condition, like, the change in the value has become negligible. Your upper bound will be a worst-case estimate -- which you have to do extra work to compute -- and I don't see how it differs much, in practice, from a timeout, which has the advantage (again from a systems perspective) of being measured in clock time rather than cycles.

Not sure about parallelization. PR doesn't suggest any advantages to me for that offhand, but then, I haven't thought about it.

DanWaterworth · on April 16, 2013

W.r.t parallelism, I'll copy here something I wrote elsewhere in this thread:

"I don't know for sure that this is possible yet, but I believe that the processor would be able to estimate the amount of work required to evaluate an expression. Using this ability, it would be able to automatically parallelize the evaluation of an expression by splitting it up into pieces of approximately equal size and then distributing them to sub processors."

mikeash · on April 16, 2013

The evaluator is the interesting part, of course. How do you stop it from getting stuck evaluating functions like this?

    def f():
        f()

If you can do general recursion, you can do infinite loops. If you can't do general recursion, I'm not sure the language will be very useful.

DanWaterworth · on April 16, 2013

No, you can't do general recursion on the evaluator alone. But it would be possible to run any general recursive function across both parts, for example if you wanted to run software not designed for this architecture. The problem with this is that optimizations would only happen on per expression basis, so it wouldn't make very good use of the architecture.

cwzwarich · on April 16, 2013

What benefit would you get from this restriction at the hardware level? Don't all practical reduction algorithms work just fine on untyped terms as well?

DanWaterworth · on April 16, 2013

I don't know for sure that this is possible yet, but I believe that the processor would be able to estimate the amount of work required to evaluate an expression. Using this ability, it would be able to automatically parallelize the evaluation of an expression by splitting it up into pieces of approximately equal size and then distributing them to sub processors.

This, and things like this, are much more difficult for untyped terms.

jerf · on April 16, 2013

It seems like you're putting a lot of stuff on silicon that better belongs in a compiler. It isn't clear to me how to make the silicon, say, check for the totality of an arbitrarily-large function in any reasonable manner; silicon prefers rather sharp bounds.

Plus anything you encode on the silicon is basically permanent. (Sure, you can tweak things with microcode, but only so much.)

DanWaterworth · on April 16, 2013

> It seems like you're putting a lot of stuff on silicon that better belongs in a compiler.

That's debatable. By making the separation between hardware and software here, it allows future processors to make substantial changes to their internals.

> It isn't clear to me how to make the silicon, say, check for the totality of an arbitrarily-large function in any reasonable manner.

Actually, that part is easy. Essentially, you can define a typed functional language without recursion (or recursive definitions). The only looping that you do allow is the use of eliminators over inductive data types.

> Plus anything you encode on the silicon is basically permanent.

The only thing that is really permanent is the interface. Future processors can change their internals, but the old programs should always work on new processors.

jerf · on April 16, 2013

"Essentially, you can define a typed functional language without recursion (or recursive definitions). The only looping that you do allow is the use of eliminators over inductive data types."

Yes, but I said I don't see how you can check totality on an arbitrarily large function efficiently. If a total function has 65,537 cases to be checked, silicon is not the place to do it. If the silicon isn't checking it, then it's the compiler doing it anyhow.

The thing is, your processor may be arranged differently, even very differently, but on the whole it can't be doing more than modern processors are, or it will be slower than modern processors, in which case, why not just continue compiling onto them anyhow? History is full of "better" processors from various places that, by the time they came out, were already a factor of 3 or 4 (or more) slower than just running on x86.

DanWaterworth · on April 16, 2013

A function is never arbitrarily large, it's made of a bounded number of sub-expressions and those sub-expressions can be checked independently.

CPUs are extraordinarily wasteful. The reason I think a processor like this might work better is because it could do a lot more work in parallel.

jerf · on April 16, 2013

A CPU can do anything. Silicon is turing complete. The question is whether it can do it more efficiently than some rather well-tuned, if suboptimally-architected, processors. It's a high bar to leap, despite substantial rhetoric to the contrary. (I believe it to be leapable, but it's not easy.)

mwcampbell · on April 16, 2013

> CPUs are extraordinarily wasteful.

I've seen this claim before, and I've even accepted it at face value before. But it seems to me that it isn't adequately supported by evidence. What evidence do we have that a substantially more efficient (therefore less wasteful) CPU architecture is possible?

knz42 · on April 16, 2013

cf the discussion in the same thread: https://news.ycombinator.com/item?id=5559282