Yeah, same. I had a more complicated prompt that failed much more reliably - ins...

Yeah, same.

I had a more complicated prompt that failed much more reliably - instead of a mirror I had another person looking from below. But it had some issues where Claude would often want to refuse on ethical grounds, like I'm working out how to scam people or something, and many reasoning models would yammer on about whether or not the other person was lying to me. So I simplified to this.

I'd love another simple spatial reasoning problem that's very easy for humans but LLMs struggle with, which does NOT have a binary output.