"The biggest models want to train on literally every piece of human-written text...

ethbr1 · 2025-02-11T22:46:57 1739314017

Even restricted to that narrower definition, the major commercial model companies wouldn't be able to afford to license all their high-quality human text.

OpenAI is Uber with a slightly less ethically despicable CEO.

It knows it's flaunting the spirit of copyright law -- it's just hoping it could bootstrap quickly enough to make the question irrelevant.

If every commercial AI company that couldn't prove training data provenance tomorrow was bankrupted, I wouldn't shed an ethical tear. Live by the sword, die by the sword.

brookst · 2025-02-12T02:38:21 1739327901

Bold idea, requiring startups to proactively prove they have not broken the law. Should we apply it to all tech startups? Let’s see silicon startups prove they have not stolen trade secrets!