Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought summarizing papers/stories/emails/meetings was one of the touted use cases of LLMs?

What are the use cases where the expected performance is high?



I didn't notice that example. I doubt top tier models have issues with that. I was more referencing Sabines mentions of hallucinating citations and papers which is an issue I also had 2 years ago but is probably solved by Deep Research at this point. She just has massive skill issues and doesn't know what shes doing.

>What are the use cases where the expected performance is high?

https://openai.com/index/introducing-chatgpt-pro/

o1-pro is probably at top tier human level performance on most small coding tasks and definitely at answering STEM questions. o3 is even better but not released outside of it powering Deep Research.

https://codeforces.com/blog/entry/137543 o3 is top 200 on Codeforces for example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: