Twitter made the decision to count code points of the normalized form of a character rather than the raw character encoding’s they are sent. (See: https://dev.twitter.com/basics/counting-characters). That was an application layer design decision that aimed to both optimize for a simplicity of the interface and also help the user out by finding the most efficient code point combinations to express the originally specified characters. They made this choice presumably because they only count the final number code points towards a message’s character usage and because it provides a max on disk size for a 140 character message.
So, since Twitter already uses application logic to transform data sent to them they are only bound by the systemic biases baked into UNICODE by choice. They could just as easily add other kinds of transformations of text sent to them in addition to the character normalization code they already have. They could also not optimize quite as agressively for a message’s size characteristics, instead counting only the visible characters towards the character limit.
As a meta point, all software gets to make decisions on how it propagates the implementation details of the systems they sit on top of. Laguages like python and Java have garbage collection because they decided it developers shouldn’t have to think about the physical memory of a computer. Every time i dont have to remember to manage pointers, memory allocation, and stride lengths its because someone decided it was important for me to not have to have my application heavily constrained by lower level concerns.
There is no reason Twitter can’t make the same decision here by adding in social concerns to how their platform functions, or why developers everywhere shouldn’t develop in a way that optimizes for non-technical concerns as well as storage and compute considerations.
