On 13/01/14 09:19, Glenn Linderman wrote: > On 1/13/2014 12:46 AM, Mark Shannon wrote: >> On 13/01/14 03:47, Guido van Rossum wrote: >>> On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman <ethan at stoneleaf.us> wrote: >>>> On 01/12/2014 06:16 PM, Ethan Furman wrote: >>>>> >>>>> >>>>> If you do : >>>>> >>>>> --> b'%s' % 'some text' >>>> >>>> >>>> Ignore what I previously said. With no encoding the result would be: >>>> >>>> b"'some text'" >>>> >>>> So an encoding should definitely be specified. >>> >>> Yes, but the encoding is no business of %s or %. As far as the >>> formatting operation cares, if the argument is bytes they will be >>> copied literally, and if the argument is a str (or anything else) it >>> will call ascii() on it. >> >> It seems to me that what people want from '%s' is: >> Convert to a str then encode as ascii for non-bytes >> or copy directly for bytes. > > Maybe. But it only takes a small tweak to the parameter to get what they want... a tweak that works in both Python 2.7 and Python 3.whatever-version-gets-this. > > Instead of > > b"%s" % foo > > they must use > > b"%s" % foo.encode( explicitEncoding ) > > which is what they should have been doing in Python 2.7 all along, and if they were, they need make no change. > > Oh, foo was a Python 2.7 str? Converted to Python 3.x str, by default conversion rules? Already in ASCII? No harm. > Oh, foo was a literal? Add b prefix, instead of the .encode("ASCII"), if you prefer. > >> So why not replace '%s' with '%a' for the ascii case and >> with '%b' for directly inserting bytes. > > Because %a and %b don't exist in Python 2.7? I thought this was about 3.5, not 2.7 ;) '%s' can't work in 3.5, as we must differentiate between strings which meed to be encoded and bytes which don't. > >> That way, the encoding is explicit. > > The encoding is already explicit. If it is bytes encoded from str, that transformation had an explicit encoding. If it is "%s" % str(...), then there is no encoding, but rather a transformation into > an ASCII representation of the Unicode code points, using escape sequences. Which isn't likely to be what they want, but see the parameter tweak above. > >> I think it is vital that the encoding is explicit in all cases where >> bytes <-> str conversion occurs. > > Since it is explicit, you have no concerns in this area. > > > Regarding the concern about implicit use of ASCII by certain bytes methods and proposed interpolations, I'm curious how many standard encodings exist that do not have an ASCII subset. I can enumerate > a starting list, but if there are others in actual use, I'm unaware of them. > > EBCDIC > UTF-16 BE & LE > UTF-32 BE & LE > > Wikipedia: The vast majority of code pages in current use are supersets of ASCII <http://en.wikipedia.org/wiki/ASCII>, a 7-bit code representing 128 control codes and printable characters. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org >
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4