Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> commonly distributed with torrents

This is the key part. And it's not certain this happened. Not defending AI data gobbling, but if we truly and honestly want to fight big-AI use of content, we cannot just presume bad faith. OpenSubtitles.org has a large dataset that is "public". It is be a dataset perfectly suitable, intended for, and therefore used for, training and data analysis.

I've used it for data analysis.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: