Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

BC is a bit of a quiet haven for Cantonese culture. Check UBC, I know they do language preservation projects related to Cantonese.


Thanks for the tip. Their Cantonese program's website is here: https://cantonese.arts.ubc.ca/

Via the announcement for the "Language Archiving in the Digital Era" workshop https://cantonese.arts.ubc.ca/language-archiving-in-the-digi... ...

I found the "Corpus of Mid-20th Century Hong Kong Cantonese" https://hkcc.eduhk.hk/

In typical academic fashion, it's behind a login wall and doesn't offer an easy way to download the whole corpus. (Understandable, given that it's based on transcribing movies that are probably still copyright-protected, but annoying.) Also, no translations.

They mention 香港粵語語料庫 as a related project, but the link is dead. I found what appears to be the new website: http://compling.hss.ntu.edu.sg/hkcancor/

That corpus is CC-BY licensed (yay!) and puts the download page front-and-center, so I like it. There's no translations either, but recordings are included, so it might still be useful for a project of mine.

Thanks again!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: