Mark Hammond wrote: > > > > > Right. The idea with open() was to write a special version (using > > #ifdefs) for use on Windows platforms which does all the needed > > magic to convert Unicode to whatever the native format and locale > > is... > > That works for open() - but what about other extension modules? > > This seems to imply that any Python extension on Windows that wants to pass > a Unicode string to an external function can not use PyArg_ParseTuple() with > anything other than "O", and perform the magic themselves. > > This just seems a little back-to-front to me. Platforms that have _no_ > native Unicode support have useful utilities for working with Unicode. > Platforms that _do_ have native Unicode support can not make use of these > utilities. Is this by design, or simply a sad side-effect of the design? > > So - it is trivial to use Unicode on platforms that dont support it, but > quite difficult on platforms that do. The problem is that Windows seems to use a completely different internal Unicode format than most of the rest of the world. As I've commented on in a different post, the only way to have PyArg_ParseTuple() perform auto-conversion is by allowing it to return objects which are garbage collected by the caller. The problem with this is error handling, since PyArg_ParseTuple() will have to keep track of all objects it created until the call returns successfully. An alternative approach is sketched below. Note that *all* platforms will have to use this approach... not only Windows or other platforms with Unicode support. > > Using parser markers for this is obviously *not* the right way > > to get to the core of the problem. Basically, you will have to > > write a helper which takes a string, Unicode or some other > > "t" compatible object as name object and then converts it to > > the system's view of things. > > Why "obviously"? What on earth does the existing mechamism buy me on > Windows, other than grief that I can not use it? Sure, you can :-) Just fetch the object, coerce it to Unicode and then encode it according to your platform needs (PyUnicode_FromObject() takes care of the coercion part for you). > > I think we had a private discussion about this a few months ago: > > there was some way to convert Unicode to a platform independent > > format which then got converted to MBCS -- don't remember the details > > though. > > There is a Win32 API function for this. However, as you succinctly pointed > out, not many people are going to be aware of its name, or how to use the > multitude of flags offered by these conversion functions, or know how to > deal with the memory management, etc. > > > Can't you use the wchar_t interfaces for the task (see > > the unicodeobject.h file for details) ? Perhaps you can > > first transfer Unicode to wchar_t and then on to MBCS > > using a win32 API ?! > > Sure - I can. But can everyone who writes interfaces to Unicode functions? > You wrote the Python Unicode support but dont know its name - pity the poor > Joe Average trying to write an extension. Hey, Mark... I'm not a Windows geek. How can I know which APIs are available and which of them to use ? And that's my point: add conversion APIs and codecs for the different OSes which make the extension writer life easier. > It seems to me that, on Windows, the Python Unicode support as it stands is > really internal. I can not think of a single time that an extension writer > on Windows would ever want to use the "t" markers - am I missing something? > I dont believe that a single Unicode-aware function in the Windows > extensions (of which there are _many_) could be changed to use the "t" > markers. "t" is intended to return a text representation of a buffer interface aware type... this happens to be UTF-8 for Unicode objects -- what other encoding would you have expected ? > It still seems to me that the Unicode support works well on platforms with > no Unicode support, and is fairly useless on platforms with the support. I > dont believe that any extension on Windows would want to use the "t" > marker - so, as Fred suggested, how about providing something for us that > can help us interface to the platform's Unicode? That's exactly what I'm talking about all the time... there currently are PyUnicode_AsWideChar() and PyUnicode_FromWideChar() to interface to the compiler's wchar_t type. I have no problem adding more of these APIs for the various OSes -- but they would have to be coded by someone with Unicode skills on each of those platforms, e.g. PyUnicode_AsMBCS() and PyUnicode_FromMBCS() on Windows. > This is getting too hard for me - I will release my windows registry module > without Unicode support, and hope that in the future someone cares enough to > address it, and to add a large number of LOC that will be needed simply to > get Unicode talking to Unicode... I think you're getting this wrong: I'm not argueing against adding better support for Windows. The only way I can think of using parser markers in this context would be by having PyArg_ParseTuple() *copy* data into a given data buffer rather than only passing a reference to it. This would enable PyArg_ParseTuple() to apply whatever conversion is needed while still keeping the temporary objects internal. Hmm, sketching a little: "es#",&encoding,&buffer,&buffer_len -- could mean: coerce the object to Unicode, then encode it using the given encoding and then copy at most buffer_len bytes of data into buffer and update buffer_len to the number of bytes copied This costs some cycles for copying data, but gets rid off the problems involved in cleaning up after errors. The caller will have to ensure that the buffer is large enough and that the encoding fits the application's needs. Error handling will be poor since the caller can't take any action other than to pass on the error generated by PyArg_ParseTuple(). Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4