Interesting. I see from the video example it took a lot of steps and there is a ...

		proc0 3 days ago \| parent \| context \| favorite \| on: Show HN: Web-eval-agent – Let the coding agent deb... Interesting. I see from the video example it took a lot of steps and there is a lot of output for a simple task. I'm thinking this probably doesn't scale very well and more complex tasks might have performance challenges. I do think it's the right direction for AI coding.

neversettles 3 days ago [–]

Yeah, I suppose to esafak's point, perhaps a benchmark for browser agent QA testing would be needed.