Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In the last five years, I've not encountered a single valid use for character count.

1. If you're using character count to do memory related things, you're introducing bugs. Not every character takes up the same amount of space (see: emoji)

2. If you're using character count to affect layout, you're introducing bugs. Not all characters are the same width. Characters can increase or decrease in size (see: ligatures, Dia critics). Any proper UI library will give you a way to measure the size of text (see JS's measureText API)

3. Even static text changes. Unless you never plan to localize your application, pin it to use one specific font (that you're bundling in your app, because not all versions of the same font are made equal), and bringing your own text renderer (because not all rendering engines support all the same features), you're introducing bugs. The one exception is perhaps the terminal, but your support for unicode in the terminal is probably poor anyway.

Even operations like taking a substring are fraught for human-readable strings. Besides worrying about truncating text in the middle of words or in punctuation (which should be a giant code smell to begin with), slicing a string is not "safe" unless you're considering all of the grammars of all of the languages you'll ever possibly deal with. It's unlikely, even if your string library perfectly supported unicode, that you'd correctly take a substring of a human tradable string. It's better to design your application with this in mind.



I have some unicode string truncation code at work. It just mindlessly chops off any codepoints that won't fit in N bytes. No worrying about grammar, combining characters, multi-codepoint-emoji, etc.

This is because the output doesn't have to be perfect, but it does absolutely positively have to have bounded length or various databases start getting real grumpy.


If you're chopping a diacritic off, you're changing meaning. If you're chopping an emoji off with a dangling ZWJ, you've potentially got an invalid character. Depending on the language and text, you might be completely changing the meaning of what you're storing.

Your database might be grumpy otherwise, but that doesn't make arbitrary truncation correct. This is an issue with your schema, it doesn't mean truncation is the best solution.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: