Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I remember awhile back that they were and do train on repositories to the point that I never wanted to use GitHub for anything other than submitting bug reports to projects.

Maybe the non-training only applies if you pay protection money? But then you run into the whole if it's public there's nothing stopping some other AI that isn't MS from accessing the repository and training on it.



There's been a huge amount of speculative information floating around that GitHub are training on private repos, but I've never seen anything credible.


Yeah, I generally expect big tech to be vacuuming, storing, and analyzing as much data as they can, but for Github doing something like training on private repos would be one of the riskiest things I can imagine. No way they are going to jeopardize their entire business to maybe get a little bit more data to train on.



That story appears to be about how if a repo has accidentally been made public various tools can access cached information about that repo even after it has been made private again. That doesn't say anything about whether or not that data will then be used for training models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: