By that logic, if you convert a copyrighted song or movie from one codec to anot...

dragonwriter · on July 21, 2023

It isn’t independently copyrightable.

Its a mechanical copy subject to the copyright on the original, though.

xxpor · on July 21, 2023

The song itself isn't output by the machine.

humanistbot · on July 21, 2023

Neither was the original training data, which was copyrighted books, art, etc.

dragonwriter · on July 21, 2023

> Neither was the original training data, which was copyrighted books, art, etc.

If the original training data is a copyrightable (derivative or not) work, perhaps eligible for a compilation copyright, the model weights might be a form of lossy mechanical copy of that work, and be both subject to its copyright and an infringing unauthorized derivative if it is.

If its not, then I think even before fair use is considered the only violation would be the weights potentially infringing copyrights on original works, but I don’t think incomplete copy automatically works for them the way it would for an aggregate; I’d think you'd have to demonstrate reproduction of the creative elements protected by copyright from individual source works to make the claim that it infringed them.

xxpor · on July 21, 2023

The output of the training though is unrecognizable.

SideburnsOfDoom · on July 21, 2023

Sometimes, the output is a recognisable plagiarism of a specific input.

If it isn't recognisable, then it's merely _distributed_ plagiarism. A million output, each of which are 0.0001% plagiarising each of million inputs.

xxpor · on July 22, 2023

Does The War on Drugs plagiarize Bruce Springsteen?

SideburnsOfDoom · on July 22, 2023

Does The War on Drugs produce outputs on command, to prompts such as "a song in the style of Bruce Springsteen" ?

Is The War on Drugs a VC-funded band replacement?

Are other future bands going to learn from The War on Drugs?

https://www.cbsnews.com/news/ai-stable-diffusion-stability-a...

https://www.documentjournal.com/2023/05/ai-art-generators-mo...

danShumway · on July 21, 2023

Correct that it would not be copyrightable, but you're missing the point.

A codec conversion is not copyrightable. The original song which is still present enough in the conversion to impact its ability to be distributed, is still copyrightable. But you don't get some kind of new copyright just because did a conversion.

For comparison, if you take a public domain book off of Gutenberg and convert it from an EPUB to a KEPUB, you don't suddenly own a copyright on the result. You can't prevent someone else from later converting that EPUB to a KEPUB again. Copyright protects creative decisions, not mathematical operations.

So if there is a copyright to be held on model weights, that copyright would be downstream of a creative decision -- ie, which data was it trained on and who owned the copyright of the data. However, this creates a weird problem -- if we're saying that the artifact of performing a mathematical operation on a series of inputs is still covered by the copyright of the components of that database, then it's somewhat tricky to argue that the creative decision of what to include in that database should be covered by copyright but that copyrights of the actual content in that database don't matter.

Or to put it more simply, if the database copyright status impacts models, then that's kind of a problem because most of the content of that training database is unlicensed 3rd party data that is itself copyrighted. It would absolutely be copyright infringement for OpenAI/Meta to distribute its training dataset unmodified.

AI companies are kind of trying to have their cake and eat it too. They want to say that model weights are transformed to such a degree that the original copyright of the database doesn't matter -- ie, it doesn't matter that the model was trained on copyrighted work. But they also want to claim that the database copyright does matter, that because the model was trained on a collection where the decision of what to include in that collection was covered by copyright, therefore the model weights are copyrightable.

Well, which is it? If model weights are just a transformation of a database and the original copyrights still apply, then we need to have a conversation about the amount of copyrighted material that's in that database. If the copyright status of the database doesn't matter and the resulting output is something new, then no, running code on a GPU is not enough to grant you copyright and never really has been. Copyright does not protect algorithmic output, it protects human creative decisions.

Notably, even if the copyright of the database was enough to add copyright to the final weights and even if we ignore that this would imply that the models themselves are committing copyright infringement in regards to the original data/artwork -- even in the best case scenario for AI companies, that doesn't mean the weights are fully protected because the only copyright a company can claim is based on the decision of what data they chose to include in the training set.

A phone book is covered by copyright if there are creative decisions about how that phone book was compiled. The numbers within the phone book are not. Factual information can not be copyrighted. Factual observations can not be copyrighted. So we have to ask the same question about model weights -- are individual model weights an artistic expression or are they a fact derived from a database that are used to produce an output? If they're not individually an artistic expression, well... it's not really copyright infringement to use a phone book as a data reference to build another phone book.