Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

but deepseek hasn't claimed the figure touted by everyone for this particular R1 model, cause that 5.6mn was apparently for Deepseek's coder model


5.6mn figure is for base Deepseek V3 model. Both instruction and reasoning tuning of it has neglectable cost in comparison with it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: