Antoine Pitrou writes: > Le jeudi 25 août 2011 à 02:15 +0900, Stephen J. Turnbull a écrit : > > Antoine Pitrou writes: > > > On Thu, 25 Aug 2011 01:34:17 +0900 > > > "Stephen J. Turnbull" <stephen at xemacs.org> wrote: > > > > > > > > Martin has long claimed that the fact that I/O is done in terms of > > > > UTF-16 means that the internal representation is UTF-16 > > > > > > Which I/O? > > > > Eg, display of characters in the interpreter. > > I don't know why you say it's "done in terms of UTF-16", then. Unicode > strings are simply encoded to whatever character set is detected as the > terminal's character set. But it's not "simple" at the level we're talking about! Specifically, *in-memory* surrogates are properly respected when doing the encoding, and therefore such I/O is not UCS-2 or "raw code units". This treatment is different from sizing and indexing of unicodes, where surrogates are not treated differently from other code points.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4