> commonly distributed with torrents This is the key part. And it's not certain ...

> commonly distributed with torrents

This is the key part. And it's not certain this happened. Not defending AI data gobbling, but if we truly and honestly want to fight big-AI use of content, we cannot just presume bad faith. OpenSubtitles.org has a large dataset that is "public". It is be a dataset perfectly suitable, intended for, and therefore used for, training and data analysis.

I've used it for data analysis.