Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Today Databricks announced [0] 6b parameter model from EleutherAI finetuned on Alpaca dataset. According to their CEO[1], training took 3 hours, and costed $30. They didn't release any details on how it was trained, but likely with LoRa.

[0] https://www.databricks.com/blog/2023/03/24/hello-dolly-democ... [1] https://twitter.com/alighodsi/status/1639251347777388544



Interesting. I wonder what the training cost was for:

https://huggingface.co/EleutherAI/gpt-neox-20b

Perhaps it’s in the paper…


They used the 6b GPT4-J, not 20B. That's what's interesting, it's a smallish large language model :).


GPT-J, not GPT4-J.


There are also some LLaMA LoRAs that are trained on the Anthropic dataset specifically for chat:

https://huggingface.co/serpdotai

I haven't done any formal tests on this yet, but with llama-13b, the overall structure of its responses definitely becomes much more ChatGPT-like. It would be very interesting to see how the 65B model performs.


Let the revolutionbbegin




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: