Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm confused by the discussion of multi-lingual corpora. Is it common in topic modeling to consider documents drawn from disjoint vocabularies, or is it just a kind of thought experiment?


Pretty common when you don't control the data source or for multi language goverment agencies (for example in Canada you may have your court case in French if you desire).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: