you are clearly doesn't understand how machine learning works, if machine learni...

bobbruno · on Nov 4, 2022

I believe this is the core point of the lawsuit - is Copilot really creating code from what it learned (which happens to, by some weird glitch, mimic the source code) or is it just a big overfitting model that learned to encode and memorize a large number of answers and spit them out verbatim when prompted?

I think that losing this lawsuit has much more serious consequences for Copilot than just having to connect to a list of millions of potential copyright owners - it would mean the model behind it is essentially a failure.

Personal opinion: the real situation lies somewhere in the middle. From what I’ve seen, I think Copilot has some ability to actually generate code, or at least adapt and connect unrelated code pieces it remembers to respond to prompts - but I also believe it just “remembers” (i.e., has a close-to-lossless encoding of the input) how to do some operations and spits them out as part of the response to some prompts.

I hardly think the lawsuit will really explore this discussion, but it sounds like a great investigation into what DL models like transformers actually learn. For all I know, it might even give insight into how we learn. I have no reason to believe that humans don’t use the same strategy of memorising some operations and learning how to adjust them “at the edges” to combine them.

still_grokking · on Nov 5, 2022

I don't think that anybody will try to answer the philosophical question in what regard what this machine does has anything to do with human reasoning.

In the end it's just a machine. It's not a person. So trying to anthropomorphize this case makes no sense from the get go.

Looking at it this way (and I guess this is the right way to look at it from the law standpoint) Copilot is just a fancy database.

It's a database full of copyrighted work…

How this database (and it's query system) works from the technical viewpoint isn't relevant. It just makes no difference as by law machines aren't people. End of story.

But should the curt (to my extreme surprise) rule that what MS did was "fair use" than the flood gates of "fairuseify through ML"[1] would be open. Given the history of copyright and/or other IP laws in the US this just won't happen! The US won't ever accept that someone would be allowed to grab all Mikey Mouse movies put them into some AI and start to create new Mikey Mouse movies. That's the unthinkable. Just imagine what this would mean. You could "launder" any copyrighted work just by uploading and re-querying it form some "ML-based database system". That would be the end of copyright. This just won't happen. MS is going lose this trail. There is no other option.

The only real question is how severe their loose will be. They used for sure also AGPLv4 code for training. Thinking this through to the end with all consequences would mean that large chunks of MS's infrastructure, and all supporting code, which means more or less all of Azure, which means more or less all of MS's software, would need to be offered in (usable!) source to all users of Copilot. I think this won't happen. I expect the court to find a way to weasel out of this consequence.

[1] https://web.archive.org/web/20220121020414/fairuseify.ml/

Brian_K_White · on Nov 6, 2022

Holy cow you are right.