[Mark Hammond] > ... > Where should the "real" documentation go? It seems maybe we need a > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something > like: > > 6.1.x - Unicode and the file system > - general discussion. > - Windows specific > - Mac specific should that appear. > - OS' with no special support (ie, "the rest") > > Does that make sense? So far is it goes, yes. I think the manual desperately needs a Unicode section for other reasons, though: from traffic on c.l.py, it's clear that few people can figure out how to do *anything* with Unicode now unless their first name begins with "M" (Mark, Martin, Marc -- definitely not Skip <wink>). There's no overview and there are no examples. The primary string method doesn't even mention Unicode (here paraphrasing questions that pop up): encode([encoding[,errors]]) Return an encoded version of the string. What does "encoded version" mean? Is that another string? An encoding object of some sort? Etc. Default encoding is the current default string encoding. What's the "current default string encoding"? How can I find out? Can't even guess what *type* it has (string? magic object? little integer?). If I don't want the default encoding, how do I specify a different one? What are the possible values? Again, can't even guess the type of the object that needs to be passed for encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a ValueError. Other possible values are 'ignore' and 'replace'. So what do 'ignore' and 'replace' mean? There's more left unsaid here than a single example could clarify, but there's not even an example -- so people stare at this wholly uncomprehending. If they stumble into the unicode() builtin function (in a different part of the manual, neither referencing nor referenced by the .encode() method), it's no better: unicode(string[, encoding[, errors]]) Decodes string using the codec for encoding. What? Hard to even guess what the function returns. Maybe, from the name, a Unicode string? Error handling is done according to errors. What? The default behavior is to decode UTF-8 in strict mode, meaning that encoding errors raise ValueError. How do encoding errors arise from a function that *de*codes? See also the codecs module. Which helps, but the relationship between the codecs module and the unicode() function isn't spelled out there either. Look up "encdoing" in the index, and you get pointers to base64, quoted-printable and the mimetypes module, which only confuses things more. I don't expect you to fix this <wink>, I'm trying to get across that the Unicode docs need work even without new gimmicks. If Fred agrees, I'm sure he'll think of a good place to put the new info too. > I have made this change to Misc/NEWS. Does this look OK > (obviously once I know what to replace "[????]" with :) Absolutely, and I don't even have to read it to say so <wink>: once *something* is checked in, we're assured it won't get dropped on the floor come release time, and anyone who has any quibbles with it can check in changes. It's not like checking in a NEWS item can break the std test suite or cause HP-UX to crash. well-not-really-sure-about-the-latter-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4