The x-risk part here still seems pretty hypothetical. Why is progress in current...

richardw · on Jan 16, 2024

Ok so propensity and outcome:

Propensity: Risk doesn't imply a guarantee of a bad outcome. It means "if you put your five year old in the sea, their risk goes up". It doesn't mean "they will definitely die". Risk up. Not 100%, just a lot higher.

Outcome: The risk isn't that we'll all die, it's that we'll be overtaken and lose control, after which all bets are off. We lose the ability to influence the future.

We put a lot of effort into ensuring our continued existence. We can barely trust people from a different country that share the human condition with us. We spend so much on defence. On cybercrime. But some are arguing that a totally alien being smarter than us is just fine, because we'll control it and can ensure indefinite kumbaya. Good luck with that. Best we can hope for is that it's closer to a buddhist monk than we are, and that it indefinitely prevents our defence people from trying to make it more aggressive.

I absolutely wouldn't ban LLM's, because they're basically unthinking toys and giving us a great taste of risks further down the line. They are not the end state of AI. The problem is not the instance of today's tech, it's the continued effort to make the AI state of the art better than us. One day it'll succeed, and that's a one-way change.

Sam Altman said, long before OpenAI: focus on slope, not y-intercept.

reissbaker · on Jan 17, 2024

It sounds like we're in agreement that banning current-gen open source LLMs is counterproductive.

In terms of "risk" and "outcome," I do think you're making some implicit assumptions that I don't share, and change our long-term outlook on AI; for example, the idea that training a model to generate tokens that accurately reflect human writing will result in "a totally alien being smarter than us" is a non-obvious leap to me. Personally, if we agree that predicting the next token means the model understands some of the logic behind the next token — which is an argument used a lot in both safety circles and more accelerationist circles — it seems to me that it also means the model has some understanding of the ethical and moral frameworks the token corresponds to, and is thus unlikely to be totally alien to us. A model that does a better job generating human-like tokens is more likely, in my mind, to think in human-like ways (and less-alien ways) than a model worse at that.

Maybe you're referring to new AI frameworks that aren't token predictors; in that case, I think it's hard to make generalized statements about how they'll work before we know what those new frameworks are. A lot of safetyist concerns pre-LLMs ended up looking pretty off-base when LLMs came out, e.g. straightforwardly misaligned "utility functions" that were unable to comprehend human values and would kill your grandmother when asked for a strawberry (because your grandmother was in possession of a strawberry).

(BTW, the "slope, not y-intercept" line was Sam Altman quoting John Ousterhout!)

richardw · on Jan 17, 2024

Agree on the agreeing, and thanks for the Sam/John note - that's great :)

No chance LLM's will get us there, I'm referring mostly to the general drive to reach AGI. I spend some of my mental cycles trying to think about what we're missing with the current tech (continuous learning, access to much wider resources than one web page of context at a time, can we use compression, graphs etc). It's a great problem to think about, but we may just totally hose ourselves when we get it right. What do I tell my kid - sorry honey, it was such fun, but now we need to hide under this rock. Model totally said it was nice and kind and trustworthy, but we showed it some human history and it went postal in self-defence.

Alignment only works up until it starts really thinking for itself. It absolutely might not be as stupid as humans are, no caveman tribal instincts. But we'd be relying on hope at that point, because control will not work. If anything it'd probably be counterproductive.