I'm not sure. I've beaten it a number of times, but only using tons of spoilers ...

still_grokking · on March 13, 2024

Of course one can beat Nethack. Many people did. That's not the point. The question was more: Without "spoilers" (a.k.a. "hardcoded heuristics and goals")?

(I never made progress because I didn't try hard enough. It became very boring after finding out that this game is random and quite nonsensical, and one can't come up with some strategy only by playing it often enough. I'm in general not enjoying dice games. I prefer games where you can come up with some winning strategy by curious observation and logical thinking.)

Because of the nature of Nethack I don't think it's a good AI test as such.

Maybe it would be if one let the AI read spoilers / walkthoughs and than let it try playing. Such a test could than maybe probe for the AI's text comprehension, and the ability to map the gained understanding to concrete actions. But just letting it play Nethack unprepared does not give any insides into the AI's capabilities, imho. It will just fail over and over again. Because it's (imho) impossible to beat Nethack without spoilers. You just can't extract the needed knowledge from playing. Even from playing it millions of times.

bubblyworld · on March 13, 2024

Relax, friend, I understand your question =) I'm just pointing out that it's not as random as you think. Mechanics in NetHack are quite predictable for the most part, but they are difficult to _discover_ without dying. Given that AIs can play hundreds of thousands of runs in the time it would take me to play one, I'm a bit more optimistic that they could learn the mechanics eventually.

I think the fundamental problem is that nobody knows how to do exploration-based reward functions effectively. Has Pitfall been solved by modern RL, for instance? As far as I know that's still an open problem (alongside getting to diamonds in minecraft without hardcoded heuristics, and other things along the same line).

(edit: just in terms of evidence for the first claim, once I'd done a run successfully with spoilers I was able to beat the game again without looking anything up. So I think it's more a discovery problem than nethack being inherently random)

still_grokking · on March 13, 2024

I think my wording was bad. There are two kinds of "randomness" here at play and I didn't differentiate properly.

For me the game mechanics as such are "random". Because you can't discover them by just playing (imho).

At the same time the game is ruled by a dice. (So even the best players will fail almost 50% of the time which is almost as random as tossing a coin, and strictly not skill based).

> Given that AIs can play hundreds of thousands of runs in the time it would take me to play one, I'm a bit more optimistic that they could learn the mechanics eventually.

My gut feeling says the opposite. How do you infer any kind of rules from almost random events? Especially if the "logic" behind the "non random" parts is actually also quite made up and arbitrary (so in a sense also "random", even to people with reason).

An "exploration-based reward function" wouldn't be enough. Because this would assume that exploration has (more or less) deterministic outcomes. But given the dice in Nethack it actually does not! You can do "everything right" and still die in almost 50% of the cases. How to infer any meaningful "world model" from such events? Imho you can't.

(I can confirm that looking up spoilers will let you make progress in Nethack. That's why I think it's boring. I've tried hundred of times prior to looking up spoilers and didn't make any progress. But after biting the bullet and starting reading some walkthough it was actually quite easy to reach some deeper levels. Until I've hit the next invisible wall. Which would require again some out-of-band knowledge… I know that reading the next spoilers would also make this wall go away. But I've lost any interest in this game after finding out exactly this: It's impossible to play without a walkthough; and with a full walkthough it's actually considerably easy, and comes down to "just having luck". At this point I could just toss a coin to determine whether "I won". That's maximally boring. I don't like dice games; and the exploration part in Nethack leads nowhere because the world is arbitrarily made up. You can't discover the mechanics without already knowing them…)

bubblyworld · on March 13, 2024

Fair enough - the mechanics are certainly random in the sense that they involve dice rolls.

> So even the best players will fail almost 50% of the time which is almost as random as tossing a coin, and strictly not skill based

Even moderately experienced players will fail close to 100% of the time. So getting to an almost 50% success rate, to my mind, shows a great deal of skill! The difference between this and a coin toss is that two people, no matter how many times they have each respectively tossed a coin, will _still_ always get a 50% success rate.

> An "exploration-based reward function" wouldn't be enough. Because this would assume that exploration has (more or less) deterministic outcomes.

I don't think this is true any more for modern AIs such as AlphaGo and its predecessors, which learn distributions of possible outcomes rather than deterministic predictions. IIRC the latest versions can even self-play games like Poker to a superhuman level.

I think so long as you are able to sample a given mechanic enough times, you can build a decent estimate of the possible outcomes (and choose your behaviour accordingly). If there is any systematic deviation from pure randomness, enough data will reveal it!

still_grokking · on March 14, 2024

> the mechanics are certainly random in the sense that they involve dice rolls.

Sorry, but that's still not what I've meant.

Dice rolls are randomness in the usual meaning of this word. But "random" can also mean "arbitrary" and/or "illogical" things. Imho Nethack mechanics are also random in this sense. They don't make sense at all… :-)

> Even moderately experienced players will fail close to 100% of the time. So getting to an almost 50% success rate, to my mind, shows a great deal of skill!

OK, you have definitely a point here.

Still not my cup of tea, such games. (And this is strictly personal, and unrelated to the rest of the discussion). I just don't like games where it's very likely that I will loose despite "doing everything right". I have no problem with games punishing merciless even small mistakes. That's OK. But having an outcome that depends mostly on the whim of the RNG is just nothing I enjoy. When I "do everything right" I like to get rewarded appropriately for it. (Of course some level of randomness is still OK. But if the RNG kills you most of the time no mater what you do this is just too frustrating for me. So I'm clearly not the target audience for Nethack… :-D).

> I don't think this is true any more for modern AIs such as AlphaGo and its predecessors, which learn distributions of possible outcomes rather than deterministic predictions.

Sure. But how do you learn from a distribution where no matter what you do you will fail in, say, 99,9% of the cases?

> IIRC the latest versions can even self-play games like Poker to a superhuman level.

Do you have some links regarding this? I thought Poker is still one of the games where AIs don't play better than humans. OK, maybe it depends on the Poker variant. There are simpler and more difficult ones.

> If there is any systematic deviation from pure randomness, enough data will reveal it!

I would agree in general.

But now we're back to the initial question: Is there enough systematic deviation from pure randomness in Nethack? Given that even people who know all the mechanics, and know some good end-to-end strategies will fail in most cases (actually, like you said, in almost all cases). And given that it's (imho) impossible to come up with this knowledge about mechanics and strategy just by playing this game. I have my doubts.

bubblyworld · on March 15, 2024

> Do you have some links regarding this? I thought Poker is still one of the games where AIs don't play better than humans. OK, maybe it depends on the Poker variant. There are simpler and more difficult ones

Texas Hold'em, one of the most popular variants - have a look at deep mind's Player Of Games, and the general technique of Counterfactual Regret Minimisation. Both are recent advances, but poker is absolutely solved at a human professional level now.

I think one thing to keep in mind is that what you find illogical has very little bearing on whether a neural net can learn to do it. Us humans come prebaked with specific priors in our brain (like cognitive biases) that AIs don't necessarily share. I'd be careful making sweeping statements about what is impossible and what isn't, personally. But I guess we'll see =)

> Sure. But how do you learn from a distribution where no matter what you do you will fail in, say, 99,9% of the cases?

You can actually test this yourself if you're interested - try to train a neural net to predict outcomes that are deterministic but with a 99.9% chance of random failure. If the net learns to succeed 0.1% of the time then your premise is false - it has successfully extracted the signal!

eru · on March 14, 2024

> An "exploration-based reward function" wouldn't be enough. Because this would assume that exploration has (more or less) deterministic outcomes. But given the dice in Nethack it actually does not! You can do "everything right" and still die in almost 50% of the cases. How to infer any meaningful "world model" from such events? Imho you can't.

You can actually do that very well. When they train LLM on text, the same prefix doesn't always lead to the same next token. And they handle that just fine.

Btw, Nethack isn't actually random: the dice use a pretty broken PRNG, and 'luck manipulation' is a thing. A computer might not actually care about the difference between spoiler-y tactics that are legible to humans, and PRNG manipulation.

(The state of the PRNG is a relatively small number of bits. Various actions can advance the PRNG, without causing any other change in the world. So you can basically make sure that you are always maximally lucky, if you can somehow recover the hidden PRNG state from the output of the program, and then model it.)

CuriouslyC · on March 13, 2024

No doubt that person is very good at nethack, but even among very good players 60% win rate is survivorship bias.