Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

he might be referring to the data in https://lmarena.ai/

they conduct blind trials were users submit a prompt, and vote on "best answer".

grok holds a very good position in its leaderboard.



In general and quickly chosen "best answer" is perhaps not the best means to analyze such output because people are on average very very stupid and at time of immediate reception less than ideally situated to discern quality of output especially if it concerns data that they aren't intimately familiar with.

For instance the lawyers who submitted briefs with references to fake cases and fake precedents were presumably satisfied with the output at time of reception but less so when they got sanctioned for thousands of dollars for presenting lies to a judge in place of truth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: