Hacker News new | past | comments | ask | show | jobs | submit login

This is the company behind RedPajama, which is a VERY big deal.

It's the most realistic open source attempt at recreating the Facebook LLaMA LLM, from scratch, in a way that supports commercial usage.

They released their full 2.6TB training dataset last month, and it's significant: https://simonwillison.net/2023/Apr/17/redpajama-data/

They've also started releasing new, commercially-usable openly licensed LLM models trained on that data. You can try one of those out here: https://huggingface.co/togethercomputer/RedPajama-INCITE-Ins...




I would also check out their 3B model. I tested it on launch with LoRA fine-tuning and found it to be surprisingly capable despite its size. I think a lot of people are skipping past testing it because it only has 3B params.

Edit: https://huggingface.co/togethercomputer/RedPajama-INCITE-Bas...


Well, MPT-7B is also commercially usable and openly licensed: https://www.mosaicml.com/blog/mpt-7b


Yeah it's really promising. It's partially trained on that RedPajama data.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: