I've written some text on Unicode for the 2.2 article, but it's doubtful I actually understand what's going on. Can people who actually understand where Unicode has been please take a look at the following? First, a short one, Mark Hammond's patch for supporting MBCS on Windows. I trust everyone can handle a little bit of TeX markup? % XXX is this explanation correct? \item When presented with a Unicode filename on Windows, Python will now correctly convert it to a string using the MBCS encoding. Filenames on Windows are a case where Python's choice of ASCII as the default encoding turns out to be an annoyance. This patch also adds \samp{et} as a format sequence to \cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and an encoding name, and converts it to the given encoding if the parameter turns out to be a Unicode string, or leaves it alone if it's an 8-bit string, assuming it to already be in the desired encoding. (This differs from the \samp{es} format character, which assumes that 8-bit strings are in Python's default ASCII encoding and converts them to the specified new encoding.) (Contributed by Mark Hammond with assistance from Marc-Andr\'e Lemburg.) Second, the --enable-unicode changes: %====================================================================== \section{Unicode Changes} Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by supplying \longprogramopt{enable-unicode=ucs4} to the configure script. When built to use UCS-4, in theory Python could handle Unicode characters from U-00000000 to U-7FFFFFFF. Being able to use UCS-4 internally is a necessary step to do that, but it's not the only step, and in Python 2.2alpha1 the work isn't complete yet. For example, the \function{unichr()} function still only accepts values from 0 to 65535, and there's no \code{\e U} notation for embedding characters greater than 65535 in a Unicode string literal. All this is the province of the still-unimplemented PEP 261, ``Support for `wide' Unicode characters''; consult it for further details, and please offer comments and suggestions on the proposal it describes. % ... section on decode() deleted; on firmer ground there... \method{encode()} and \method{decode()} were implemented by Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally were implemented by Fredrik Lundh and Martin von L\"owis. \begin{seealso} \seepep{261}{Support for `wide' Unicode characters}{PEP written by Paul Prescod. Not yet accepted or fully implemented.} \end{seealso} Corrections? Thanks in advance... --amk
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4