Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > > ... > > > > IMHO, all auto-conversions should use the default encoding. The > > main point here is not to confuse the user with even more magic > > happening under the hood. > > I don't see anything confusing about having unicode-escape be the > appropriate escape used for repr. Maybe we need to differentiate between > lossless and lossy encodings. If the default encoding is lossless then > repr could use it. Otherwise it could use unicode-escape. Simply because auto-conversion should use one single encoding throughout the code. > Anyhow, why would it be wrong for Fredrick to hard-code an encoding in > repr but right for me to hard-code one in minidom? Because hardcoding the encoding into the core Python API touches all programs. Hardcoded encodings should be userland options whereever possible. Besides, we're talking about __repr__ which is mainly a debug tool and doesn't affect program flow or interfacing in any way. The format used is a userland decision and the encoding used for it is too. > Users should not need > to comb through the hundreds of modules in the library figuring out what > kind of Unicode handling they should expect. It should be as centralized > as possible. True. > > If the programmer knows that he'll have to deal with Unicode > > then he should make sure that the proper encoding is used > > and document it that way, e.g. use unicode-escape for Minidom's > > __repr__ methods. > > One of the major goals of our current Unicode auto-conversion > "compromise" is that modules like xmllib and minidom should work with > Unicode out of the box without any special enhancements. According to > Guido, that's the primary reason we have Unicode auto-conversions at > all. > > http://www.python.org/pipermail/i18n-sig/2000-May/000173.html > > I'm going to fight very hard to make basic Unicode support in Python > modules "just work" without a bunch of internationalization knowledge > from the programmer. Great :-) The next big project ought to be getting the standard lib to work with Unicode input. A good way to test drive this, is running Python with -U option. > __repr__ is pretty basic. > > > > the reason for this patch was to avoid forcing everyone to deal with > > > this in their own code, by providing some kind of fallback behaviour. > > > > That's what your patch does; I don't see a reason to change it :-) > > If you're still proposing that I should deal with it in a particular > module's domain-specific code then the patch isn't done yet! You don't have too: a user who uses Latin-1 tag names will see the output of __repr__ as Latin-1... pretty straight forward if you ask me. If you want to make sure that __repr__ output is printable everywhere you should use an explicit lossless encoding for your application. Again, this is a userland decision which you'll have to make. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4