Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fingerprinting data to find when it's been copy-pasted is a neat application for invisible characters!

I've also found a lot of identical characters when handling Chinese text. Note that Google Translate does not handle these correctly.

https://github.com/pingtype/pingtype.github.io/blob/master/r...

It's about the Kangxi Radicals Unicode block, compared the CJK characters block. If you want me to write a blog post about it, please comment and I'll get around to it.



It's interesting to me, because I've seen this effect (while copy pasting) and wondered why... if I hadn't been translating I would never have noticed.

Even though some are noticeably different:

⿌ 黾


Thanks for the encouragement - I'll write that blog post when I get a chance!

黾 is the simplified version of ⿌.

(In http://pingtype.github.io click Advanced > Regional, paste into the Simplified text box, then click "Simplified to Traditional")


I really like the fact that ⼚/⺁ and ⽰/⺬ and so on are separately encoded in a single block (technically, though, they aren't).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: