Hacker News new | past | comments | ask | show | jobs | submit login

Was it via gemma.cpp or some other library? I've seen a few people note that gemma performance via gemma.cpp is much better than llama.cpp, possible that the non-google implementations are still not quite right?



I eval'd it with vllm.

One thing I do suspect people are running into is sampling issues. Gemma probably doesn't like llama defaults with its 256K vocab.

Many Chinese llms have a similar "default sampling" issue.

But our testing was done with zero temperature and constrained single token responses, so that shouldnt be an issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: