On Tue, Sep 30, 2008 at 7:04 PM, Steven D'Aprano <steve at pearwood.info> wrote: >> I believe on disk it uses UTF-16. > > Which is made up of bytes. There may be byte sequences that are illegal > UTF-16, but that's not what Martin said. I don't understand how there > can be UTF-16 sequences which don't correspond to some sequence of > bytes. How would they be represented in memory? Is this to do with the > endianness of the UTF-16 sequence? It has to do with the internal mapping between the ANSI and Unicode functions. On NT systems, CreateFileA will map the ANSI bytestring to a Unicode filename via the active code page, and call CreateFileW accordingly. The active code page cannot be set to something as useful as UTF-8, so given any actual code page (1252, 932, etc.) there are Unicode strings that cannot be represented with a bytestring provided to the ANSI function. -- Michael Urman
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4