Jack Jansen wrote: > I posted a question on Unicode support in getargs.c last month (working > on a different project), but now that I'm trying to support > unicode-based APIs more seriously I find that it leaves even more to be > desired. I'd like to help to fix this, but I need some direction on > how things should be fixed. > > Here are some of the issues I ran in today: > - Unicode objects have a companion string object, meaning that you can > pass a unicode object to an "s" format and have the right thing happen. > String objects have no such accompanying unicode object, and I think they > should have. Right now you cannot pass a string object when the C > routine expects a unicode object. You can: parse the object and then pass it to PyUnicode_FromObject(). > - There is no unicode equivalent of "c", the single character. > - "u#" does something useful, but something completely different from > what "s#" does. More to the point, it probably does something > dangerous, if I understand correctly. If I write a C routine with an > "u#" format and the Python code passes a string object the string object > will be used as a buffer object and its binary contents will be interpreted > as unicode. If the argument in question is a filename this will produce > very surprising results:-) True; "u#" does exactly the same as "s#" -- it interprets the input as binary buffer. > I'd like unicode objects to be get a little more first class citizenship, > especially in the light of operating systems that are primarily (or > exclusively) unicode based, such as Mac OS X or Windows CE, to sum things up. You would be far better off using the Unicode API on the objects which are passed into the function rather than relying on the getargs parser to try to apply some magic to the input objects. It might be worthwhile extending the parser markers a bit more or allowing e.g. introduce "us#" to return Unicode objects much like "es#" returns strings... I think we'd need some examples of use though before deciding what's the right way to do this ("es#" was implemented after an request by Mark Hammond to be able to handle Unicode file names for Win CE). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4