RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2002-August/027766.html below:

[Python-Dev] PEP 277 (unicode filenames): please review

[Python-Dev] PEP 277 (unicode filenames): please reviewJack Jansen Jack.Jansen@oratrix.com
Tue, 13 Aug 2002 23:14:30 +0200

Previous message: [Python-Dev] PEP 277 (unicode filenames): please review
Next message: [Python-Dev] PEP 277 (unicode filenames): please review
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On dinsdag, augustus 13, 2002, at 03:01 , Guido van Rossum wrote:
>
> Looks like it isn't you: the filename somehow contains a character
> that's not in the Latin-1 subset of Unicode, and no encoding can fix
> that for you.  I don't know why -- you'll have to figure out why your
> keyboard generates that character when you type o-umlaut.

No, it's the way the filesystem stores filenames, apparently.=20
Or, at least, it's the way the filesystem API's expose those=20
filenames. Here's a session again (this time I'm using the=20
terminal in utf-8 mode):

 >>> x =3D "fr\xc3\xb6r"
 >>> os.listdir(".")
['.DS_Store']
 >>> open(x, "w")
<open file 'fr=F6r', mode 'w' at 0x130838>
 >>> os.listdir(".")
['.DS_Store', 'fro\xcc\x88r']
 >>> os.path.exists('fro\xcc\x88r')
True
 >>> os.path.exists("fr\xc3\xb6r")
True

If I create a file with an o-umlaut it gets decomposed into an o=20
and a combining umlaut.

[Jack goes off and wrestles his way through a gazillion websites=20
with Unicode information]

If I understand the unicode standard (according to unicode.org)=20
correctly this means that MacOS stores filenames in NFD=20
normalized form, with all combining characters split out, and=20
this is the preferred normalized form. Am I correct here?

But, even if NFC is the preferred normalized form (the documents=20
I saw hinted that this may have been the case in previous=20
Unicode standards:-): both NFC and NFD renditions of this string=20
are legal unicode, aren't they? And if they are then both should=20
be converted to the same latin-1 string, shouldn't they?

Do I misunderstand something, or this this a bug (limitation?)=20
in the unicode->latin-1 decoder?
--
- Jack Jansen        <Jack.Jansen@oratrix.com>       =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --=20
Emma Goldman -

Previous message: [Python-Dev] PEP 277 (unicode filenames): please review
Next message: [Python-Dev] PEP 277 (unicode filenames): please review
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4