HiveBrain v1.2.0
Get Started
← Back to all entries
patternMajor

Is UTF-8 the final character encoding for all future time?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
theutfallcharactertimefuturefinalforencoding

Problem

It seems to me that Unicode is the "final" character encoding. I cannot imagine anything else replacing it at this point. I'm frankly confused about why UTF-16 and UTF-32 etc. exist at all, not to mention all the non-Unicode character encodings (unless for legacy purposes).

In my system, I've hardcoded UTF-8 as the one and only supported character encoding for my database, my source code files, and any data I create or import to my system. My system internally works solely in UTF-8. I cannot imagine ever needing to change this, for any reason.

Is there a reason I should expect this to change at some point? Will UTF-8 ever become "obsolete" and replaced by "UniversalCode-128" or something, which also includes the alphabets of later discovered nearby galaxies' civilizations?

Solution

UTF-8 might not last forever, but you probably don't have to worry too much.

Two universal truths:

  • We can't predict the future.



  • Nothing lasts forever, especially in software.



But that doesn't mean the benefit of (trying to) future-proof your code always outweighs the cost.

Is UTF-8 likely to become obsolete any time soon?

I would say no. UTF-8 is quite common, which makes it harder to replace. Unicode also still has quite a bit of empty space, meaning there isn't likely to be a pressing need to replace it soon. Between 2010 and 2020, less than 40k characters have been added. It will take about 240 years to use up the remaining ~1 million unallocated characters if we assume we keep going at the same rate. This is a lot faster than I imagined, but still quite a while away and assuming it will keep going at the same rate is quite an assumption.

It also doesn't seem like there'd be a need to replace it due to a fundamental flaw in the encoding. With other types of standards or technologies there may be some security issue that could be exploited, but this doesn't seem likely with character encodings that only tells you how characters are stored.

I speculate if a need to replace it arises, it would be due to inefficiencies or constraints in new technology. Someone could develop some new piece of technology that rethinks how data is stored or loaded, which might make UTF-8 less than ideal or unusable. But there would still be plenty of systems without that technology for quite a few years.

Note that I didn't ask "are we likely to see a new character encoding any time soon". Anyone can create a new standard, but that doesn't mean it will be widely adopted nor replace other standards.

How bad would it be for you if there's a new standard?

Probably not that bad.

Even if there is a new standard that's widely adopted, your system will likely keep functioning for the foreseeable future with little to no changes. There are a lot of legacy systems out there.

If your system doesn't support the new encoding, you may have some issues with the user or other systems trying to send you data you don't support. But your system could still use UTF-8 internally, even if this means you don't support some characters (which might not be good, but it won't necessarily break your system).

Also, if it were to be replaced due to a reason other than running out of space (which, as noted above, doesn't seem likely any time soon), UTF-8 could likely be extended to include any characters in the new encoding. Meaning you can just convert from one encoding to the other where required and UTF-8 would still be usable.

Unicode versus Unicode?

The difference between UTF-8, UTF-16 and UTF-32 seems minor when compared to other (non-Unicode) encodings. They all support the same characters, so it shouldn't be a huge issue if one replaces the other.

If another one of those were to become the widely adopted one, it would probably be trivial to convert between them where required and continue to use UTF-8 everywhere else.

Context

StackExchange Computer Science Q#127182, answer score: 35

Revisions (0)

No revisions yet.