Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the current approach is to just invent yet another "meta layer" of characters and declare that this particular sequence of bytes/codepoints/surrogate pairs/grapheme clusters/extended grapheme clusters/zwj sequences/whatever else you can think of has a special meaning and does not behave like you think it does. See also Henri Sivonen's essay on unicode string length [1]

So in a way, Unicode is already long past the time where you invent NATs and other hacks to buy you time with the scarcity problem.

[1] https://hsivonen.fi/string-length/



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: