I think the current approach is to just invent yet another "meta layer" of characters and declare that this particular sequence of bytes/codepoints/surrogate pairs/grapheme clusters/extended grapheme clusters/zwj sequences/whatever else you can think of has a special meaning and does not behave like you think it does. See also Henri Sivonen's essay on unicode string length [1]
So in a way, Unicode is already long past the time where you invent NATs and other hacks to buy you time with the scarcity problem.
So in a way, Unicode is already long past the time where you invent NATs and other hacks to buy you time with the scarcity problem.
[1] https://hsivonen.fi/string-length/