> First, a short one, Mark Hammond's patch for supporting MBCS on > Windows. I trust everyone can handle a little bit of TeX markup? > > % XXX is this explanation correct? > \item When presented with a Unicode filename on Windows, Python will > now correctly convert it to a string using the MBCS encoding. > Filenames on Windows are a case where Python's choice of ASCII as > the default encoding turns out to be an annoyance. > > This patch also adds \samp{et} as a format sequence to > \cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and > an encoding name, and converts it to the given encoding if the > parameter turns out to be a Unicode string, or leaves it alone if > it's an 8-bit string, assuming it to already be in the desired > encoding. (This differs from the \samp{es} format character, which > assumes that 8-bit strings are in Python's default ASCII encoding > and converts them to the specified new encoding.) > > (Contributed by Mark Hammond with assistance from Marc-Andr\'e > Lemburg.) I learned something here, so I hope this is correct. :-) > Second, the --enable-unicode changes: > > %====================================================================== > \section{Unicode Changes} > > Python's Unicode support has been enhanced a bit in 2.2. Unicode > strings are usually stored as UCS-2, as 16-bit unsigned integers. > Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned > integers, as its internal encoding by supplying > \longprogramopt{enable-unicode=ucs4} to the configure script. When > built to use UCS-4, in theory Python could handle Unicode characters > from U-00000000 to U-7FFFFFFF. I think the Unicode folks use U+, not U-, and the largest Unicode chracter is "only" U+10FFFF. (Never mind that the data type can handle larger values.) > Being able to use UCS-4 internally is > a necessary step to do that, but it's not the only step, and in Python > 2.2alpha1 the work isn't complete yet. For example, the > \function{unichr()} function still only accepts values from 0 to > 65535, Untrue: it supports range(0x110000) (in UCS-2 mode this returns a surrogate pair). Now, maybe that's not what it *should* do... > and there's no \code{\e U} notation for embedding characters > greater than 65535 in a Unicode string literal. Not true either -- correct \U has been part of Python since 2.0. It does the same thing as unichr() described above. > All this is the > province of the still-unimplemented PEP 261, ``Support for `wide' > Unicode characters''; consult it for further details, and please offer > comments and suggestions on the proposal it describes. > > % ... section on decode() deleted; on firmer ground there... > > \method{encode()} and \method{decode()} were implemented by > Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally > were implemented by Fredrik Lundh and Martin von L\"owis. > > \begin{seealso} > > \seepep{261}{Support for `wide' Unicode characters}{PEP written by > Paul Prescod. Not yet accepted or fully implemented.} > > \end{seealso} > > Corrections? Thanks in advance... If I were you, I would make sure that Marc-Andre and Martin agree with me before adopting my comments above... And thank *you* for doing this very useful write-up again! (I'm doing my part by writing up the types/class unification thing -- now mostly complete at http://www.python.org/2.2/descrintro.html.) --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4