This is the key part. And it's not certain this happened. Not defending AI data gobbling, but if we truly and honestly want to fight big-AI use of content, we cannot just presume bad faith. OpenSubtitles.org has a large dataset that is "public". It is be a dataset perfectly suitable, intended for, and therefore used for, training and data analysis.
This is the key part. And it's not certain this happened. Not defending AI data gobbling, but if we truly and honestly want to fight big-AI use of content, we cannot just presume bad faith. OpenSubtitles.org has a large dataset that is "public". It is be a dataset perfectly suitable, intended for, and therefore used for, training and data analysis.
I've used it for data analysis.