Serhiy Storchaka writes: > Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is > used instead of UCS4) is the better choice for CPython. I suppose that > with populating emoticons and other icon characters in nearest 5 or 10 > years, even English text will often contain astral characters. And > spending 4 bytes per character if long text contains one astral > character looks too prodigally. Why use something that complex if you don't have to? For the use case you have in mind, just map them into private space. If you really want to be aggressive, use surrogate space, too (anything that cares what a scalar represents should be trapping on non-scalars, catch that exception and look up the char -- dangerous, though, because such exceptions are probably all over the place).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4