To quote a reply from the above StackOverflow thread:
"So, they added a snowman with snow AND a snowman without snow , so that the weather forecaster of this world can avoid the dull snowflake , but we will never get our missing superscript q‽"
I don't understand, why Unicode must (should?) contain superscript and subscript glyphes at all. Declared goal of Unicode is to have encoding of all characters used by all languages, past and modern. Subscript and superscript are not used by any language as separate characters, it is typesetting property. It should be solved by other means, not by character/glyph encoding.
Should Unicode include ALL characters strike-out? Underlined? Double-underlined? Small-caps variant for all letters for languages where small-caps are used in typography tradition?
And, BTW, what do you mean by "all letters"? Should Unicode contain sub/superscript variants of Hangul or Devanagari or letters from hundreds other non-latin-alphabae languages? So, Unicode must be approximately tripled, bar hieroglyphic part (and why hieroglyphics should not be sub/superscripted?)?
This is probably an edge case, but I work in lab software that uses chemical symbols and having sub and super characters saves lots of headaches. I can just store "CO₂" in a database, query it, and display it back as a simple string, or display values in scientific notation like 1,3×10³, without having to use any formatting.
But to be honest I'm not sure what the parent comment wants to see added because at the moment having all the letters from A-Z, numbers from 0-9, and plus minus and equals signs as both subscript and superscript seems to be enough.
Upper-case subscripts are missing, for one: I'm not allowed to talk about the normal force F_N in plain text email. Superscript and subscript Greek letters would also be nice to have, eg in context of relativity.
Why not Devanagari then? This Europe-centric point of view bother me.
Sure: As I mentioned in another comment, I'd add markers to enable arbitrary super and subscripting.
However, the question I responded to was asking what specifically people were missing in practice, and the examples I gave are things I personally would have used if they had been available.
Should Unicode contain sub/superscript variants of Hangul or Devanagari or letters from hundreds other non-latin-alphabae languages?
Nope, you'd use markers similar to U+200E (LEFT-TO-RIGHT MARK) and U+200F (RIGHT-TO-LEFT MARK) that already exist to indicate text direction (which is also a typesetting property).
They are relevant because Unicode had to define the bidirectional rendering and not every rendering can be automatically inferred from logical (abstract) characters. Unicode has no reason to define the general text rendering including subscripts and superscripts, so there is no reason for Unicode to define control characters for them.
Unicode defines characters, their semantics and (very flexible) guidelines for rendering them. Unlike, say, bold, italic or super/subscripts, bidirectionality is an intrinsic property of those characters and can't be easily refactored.
Unicode specifically states that it doesn't define the semantics of characters. That would seriously interfere with its purpose of defining characters.
There are some notable exceptions, and they are acknowledged to be mistakes.
> Unicode specifically states that it doesn't define the semantics of characters.
The Unicode Standard explicitly says otherwise:
> Characters have well-defined semantics. These semantics are defined by explicitly assigned character properties, rather than implied through the character name or the position of a character in the code tables (see Section 3.5, Properties). [1]
> The Unicode Standard associates a rich set of semantics with characters and, in some instances, with code points. The support of character semantics is required for conformance; see Section 3.2, Conformance Requirements. [2]
To be fair, it refers to "character" semantics which is more or less abstracted by character properties. It is not like that, for example, △ U+25B2 WHITE UP-POINTING TRIANGLE UNICODE CHARACTER can only ever be used for denoting triangles. But it has defined semantics in the way that the character has properties expected for such symbols.
That's a cop out. You could equally say that new emojis shouldn't be added because you should use inline images for those. Or RTL markers shouldn't be added because you should use dedicated text styling for that.
There are a ton of places that don't support superscript markup.
> You could equally say that new emojis shouldn't be added because you should use inline images for those.
If emojis weren't allocated out of compatibility concern, this would be exactly my opinion from the day 1. To be honest I'm not still happy with the current emoji assignments and semantics. Not even Unicode people are satisfied either, there are numerous proposals for replacing emoji with something else (example keyword: QID emoji).
> RTL markers shouldn't be added because you should use dedicated text styling for that.
> There are a ton of places that don't support superscript markup.
Unlike most text attributes, bidirectionality is an intrinsic property of abstract characters and thus absolutely within the Unicode's scope. Ideally you can't and shouldn't make some LTR character to behave like RTL characters or vice versa. Bidi control characters only exist to correct automatic rendering, and can be presented out of band (the Bidi specification is explicitly designed for this use case in mind [1]).
> You could equally say that new emojis shouldn't be added because you should use inline images for those.
Well, that's really a better solution. Or a unicode character that allows you to set a pixel on a 256x256 grid and one to compose them. Strike that. Better not give anyone bad ideas.
Should we also have slanted, bold, semi-bold, light and underlined versions of every code point? Versions with/without serifs? For monospaced text? Those are all presentational matters. That we have super/subscripts in Unicode in the first place seems to have been just a hack to help terminal emulator software deal with obsolete encodings like ISO-8859-1: https://www.unicode.org/L2/L2000/00159-ucsterminal.txt
Those are intended for maths, not for formatted text. Variables in mathematics are usually a single character, so there is a great variety of ways to format the characters to create different symbols. Diacritical marks, underlines, etc. are also used for this.
Fair enough, but general formatting codes would overlap with what is already supported in rich-text formats like HTML or LaTeX. Unicode is a standard for encoding characters, it is not supposed to be a rich-text document format itself.
https://stackoverflow.com/questions/6638471/why-does-the-uni...
To quote a reply from the above StackOverflow thread: "So, they added a snowman with snow AND a snowman without snow , so that the weather forecaster of this world can avoid the dull snowflake , but we will never get our missing superscript q‽"