Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, I agree that reinforcement learning is probably a bad approach to FAI. Most of our toy models involve utility functions encoded directly into the AI, not reinforcement.

That said, it's indeed very hard to directly specify a utility function involving paperclips. If our universe were a Game of Life universe and we knew exactly which configuration corresponds to a paperclip, I'd be able to do that right now. But since we don't know the true laws of physics, the "hard way" involves encoding some kind of Solomonoff prior over all possible physical universes, and a rule for recognizing paperclips in each of them. That's kind of a tall order.

There's a shortcut that involves encoding a precise mathematical description of a human mind into an AI, and passing the buck by saying "the AI must maximize whatever function the human mind would output, given enough time to think". That would actually be straightforward to implement, if we had a description of a human mind that we were willing to trust. Unfortunately, the naive form of that approach immediately fails due to acausal blackmail. There's no obvious fix, but some folks are trying to devise non-obvious fixes, and I suppose it can be made to work in the long run.

Of course, after you can formulate a utility function in terms of a precise description of a human mind, the next step is reformulating it in terms of the outputs of an actually existing person, say some webcam videos and written instructions. You instruct the AI to infer the simplest program that would generate these outputs (using something like the Solomonoff prior again), and then proceed as in the previous step. That part has its own problems, but I expect that it can also be made to work.

The whole approach is kind of a long shot, but I hope that I've given a sense that the problem could be solvable by careful human effort on the scale of years. There are other approaches as well.



>That said, it's indeed very hard to directly specify a utility function involving paperclips. If our universe were a Game of Life universe and we knew exactly which configuration corresponds to a paperclip, I'd be able to do that right now. But since we don't know the true laws of physics, the "hard way" involves encoding some kind of Solomonoff prior over all possible physical universes, and a rule for recognizing paperclips in each of them. That's kind of a tall order.

Or we just need an approach to AGI that gives us more conceptual abstraction than AIXI and its Solomonoff-based reasoning. Which we needed anyway, since AIXI_{tl} is "asymptotically optimal" with an additive constant larger than the remaining lifetime of the Earth.

Luckily, there's quite a large amount of bleeding-edge research into exactly that: learning "white-box" representations that are understandable and manipulable to human operators.


AIXItl isn't really the kind of AI that I like, because it's reflectively inconsistent. In any case, the time complexity of AIXItl is kind of irrelevant at this stage, because we're trying to figure out what is the right thing to optimize. Only then we should start figuring out how to optimize that thing efficiently, because we really don't want to optimize the wrong thing efficiently. I'm very skeptical that approaches based on "conceptual abstraction" can tell us the right thing to optimize, as opposed to my preferred approach (defining a utility function over mathematical objects directly).


> I'm very skeptical that approaches based on "conceptual abstraction" can tell us the right thing to optimize, as opposed to my preferred approach (defining a utility function over mathematical objects directly).

And I'm very skeptical that mathematical Platonism is useful for AI: "mathematical objects directly" do not exist in the real world, and it is very much real-world things on which we want our software to operate. "Conceptual abstraction" simply refers to a learning algorithm that possesses a representation of, for instance, a chair, that is not composed entirely of a concrete feature-set (visual edges, orientations, and colors) and can thus be deployed to generatively model chairs in general.

Computational cognitive science is working towards this sort of thing, and the results should start to hit the machine-learning community fairly soon.


> "mathematical objects directly" do not exist in the real world

Well, there's an influential minority that thinks mathematical objects are all that exists (Tegmark multiverse). I don't necessarily agree with them, but that's one way to rigorously define the domain for a utility function, in a way that is not obviously exploitable. Another way is to define utility in terms of an agent's perceptions, but that is exploitable by wireheading, and IMO that flaw is unfixable as the agent gets more powerful. I'm not aware of any other approaches that are different in principle from those two, so I'll stick with the lesser evil for now, and hope that someone comes up with a better idea.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: