We’re definitely going to need better benchmarks for agentic tasks, and not just... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jerpint 7 months ago \| parent \| context \| favorite \| on: Gemini 2.0: our new AI model for the agentic era We’re definitely going to need better benchmarks for agentic tasks, and not just code reasoning. Things that are needlessly painful that humans go through all the time

AuthConnectFail 7 months ago [–]

it's insane on lmarena for a size, livebench should have it soon too I guess

maeil 7 months ago | [–]

The size isn't stated, not necessarily a given that it's as small as 1.5-Flash.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact