Is it faster for large models, or are the optimizations more noticeable with sma... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Gracana 3 months ago \| parent \| context \| favorite \| on: Basic Facts about GPUs Is it faster for large models, or are the optimizations more noticeable with small models? Seeing that the benchmark uses a 0.6B model made me wonder about that.

tough 3 months ago [–]

I have not tested it but its from a deepseek employee i don't know if it's used in prod there or not!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact