It's the most realistic open source attempt at recreating the Facebook LLaMA LLM, from scratch, in a way that supports commercial usage.
They released their full 2.6TB training dataset last month, and it's significant: https://simonwillison.net/2023/Apr/17/redpajama-data/
They've also started releasing new, commercially-usable openly licensed LLM models trained on that data. You can try one of those out here: https://huggingface.co/togethercomputer/RedPajama-INCITE-Ins...
Edit: https://huggingface.co/togethercomputer/RedPajama-INCITE-Bas...
It's the most realistic open source attempt at recreating the Facebook LLaMA LLM, from scratch, in a way that supports commercial usage.
They released their full 2.6TB training dataset last month, and it's significant: https://simonwillison.net/2023/Apr/17/redpajama-data/
They've also started releasing new, commercially-usable openly licensed LLM models trained on that data. You can try one of those out here: https://huggingface.co/togethercomputer/RedPajama-INCITE-Ins...