On Apr 24, 2009, at 8:00 AM, Paul Moore wrote: > However, it *does* agree with the reality of Windows file systems. The > fundamental problem here is that there is a strong OS disparity - for > Windows, the OS uses Unicode, for POSIX, the OS uses bytes. It's unfortunately the case that this isn't *precisely* true. Windows uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit sequences. Neither one is required by the operating system to be a proper unicode encoding. The main difference is that there is already a widely accepted way to decode a improperly-encoded 16-bit-sequence with the utf-16 codec: simply leave the lone surrogate pairs in place. James
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4