On dinsdag, augustus 13, 2002, at 03:01 , Guido van Rossum wrote: > > Looks like it isn't you: the filename somehow contains a character > that's not in the Latin-1 subset of Unicode, and no encoding can fix > that for you. I don't know why -- you'll have to figure out why your > keyboard generates that character when you type o-umlaut. No, it's the way the filesystem stores filenames, apparently.=20 Or, at least, it's the way the filesystem API's expose those=20 filenames. Here's a session again (this time I'm using the=20 terminal in utf-8 mode): >>> x =3D "fr\xc3\xb6r" >>> os.listdir(".") ['.DS_Store'] >>> open(x, "w") <open file 'fr=F6r', mode 'w' at 0x130838> >>> os.listdir(".") ['.DS_Store', 'fro\xcc\x88r'] >>> os.path.exists('fro\xcc\x88r') True >>> os.path.exists("fr\xc3\xb6r") True If I create a file with an o-umlaut it gets decomposed into an o=20 and a combining umlaut. [Jack goes off and wrestles his way through a gazillion websites=20 with Unicode information] If I understand the unicode standard (according to unicode.org)=20 correctly this means that MacOS stores filenames in NFD=20 normalized form, with all combining characters split out, and=20 this is the preferred normalized form. Am I correct here? But, even if NFC is the preferred normalized form (the documents=20 I saw hinted that this may have been the case in previous=20 Unicode standards:-): both NFC and NFD renditions of this string=20 are legal unicode, aren't they? And if they are then both should=20 be converted to the same latin-1 string, shouldn't they? Do I misunderstand something, or this this a bug (limitation?)=20 in the unicode->latin-1 decoder? -- - Jack Jansen <Jack.Jansen@oratrix.com> =20 http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution --=20 Emma Goldman -
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4