Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If ARC-AGI were a good benchmark for "AGI", then MindsAI should effectively be blowing away current frontier models by order of magnitude. I don't know what MindsAI is, but the post implies they're basically fine-tuning or using a very specific strategy for ARC-AGI that isn't really generalizable to other tasks.

I think it's a nice benchmark of a certain type of spatial/visual intelligence, but if you have a model or technique specifically fine-tuned for ARC-AGI then it's no longer A"G"I



Perhaps a benchmark could be a good approximate upper bound for something without being a good approximate lower bound for that thing?


I clarified in a another post I mean for benchmarking standalone models, not ones fine-tuned for solving ARC


I mean, there are a lot of tasks that frontier models excel at which many humans wouldn't be able to complete.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: