I don't know that the Unicode docs need massive work, but the docs that are there simply don't answer the technical questions people have: they're too thin. Let's keep it simple. Contrast the Library manual's: unicode(string[, encoding[, errors]]) Decodes string using the codec for encoding. Error handling is done according to errors. The default behavior is to decode UTF-8 in strict mode, meaning that encoding errors raise ValueError. See also the codecs module. with Andrew's description (from http://www.amk.ca/python/2.0/): unicode(string [, encoding] [, errors]) Creates a Unicode string from an 8-bit string. encoding is a string naming the encoding to use. The errors parameter specifies the treatment of characters that are invalid for the current encoding; passing 'strict' as the value causes an exception to be raised on any encoding error, while 'ignore' causes errors to be silently ignored and 'replace' uses U+FFFD, the official replacement character, in case of any problems. The latter addresses several *fundamental* questions untouched by the former, like whar are the datatypes of the arguments and the result, what values does errors accept, and what do they mean? The first blurb answers some more, like what's the default encoding, and which exception is raised? Neither is complete on its own, but the reference manual should have a complete answer to all such questions. It doesn't have to go on at great length. A round-trip example would be invaluable. If Fred wanted to incorporate a brief overview too, a light rework of Andrew/Moshe's writeup would be an excellent start.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4