Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>"The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process."

>"The Stack is a collection of source code from repositories with various licenses. Any use of all or part of the code gathered in The Stack must abide by the terms of the original licenses, including attribution clauses when relevant."

Does it have a view of what licenses can mix, or is it simply disallowed from crossing that boundary and only offer answers sourced entirely within the confines of this or that specific license? The latter poses some interesting scenarios and questions.



Permissively licensed would imply non-copyleft to me. That means only licenses like Apache or MIT would be allowed to be train on, but not licenses like GPL.


It's my understanding that GPLv3 is perfectly fine to make businesses on, you just have to make your code open source. There isn't anything wrong with that, and in today's age, I would actually suggest that's an enormous positive for a business, as it allows people to trust the company much more.


Details are here: https://huggingface.co/datasets/bigcode/the-stack

There are 193 licenses in total. v1.0 of The Stack included MPL/EPL/LGPL whereas v1.1+ doesn't include them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: