Recently, "Martin v. Loewis" <martin@v.loewis.de> said: > When the discussion of tagging binary strings in source code came up, > I started to look into the standard library which string literals > would have to be tagged as byte strings, and which are really > character strings. > > I found that the overwhelming majority of string literals in the > standard Python library really denotes byte strings, if you ignore doc > strings. Sometimes, it isn't obvious that they are binary strings, > hence the smiley. [leaving only one example in:] > version = "HTTP/0.9" > status = "200" > reason = "" > > Protocol elements, thus byte string. I think you're taking it too far now. I think we should assume that ASCII survives. If Python runs on an EBCDIC machine (does it?) I assume that at some point the conversion of EBCDIC<->ASCII is handled semi-transparently. Also, as these things are readable they should be treated as such. It should be possible to do >>> print u"Funny reply to my "+unicode(version)+u" message" especially when the "funny reply" bit is in Japanese. What I would agree with, I think, is if we tag these strings as "ascii". And that is also what the BDFL pronounced at some point: Python sourcecode is ASCII, and if you put 8 bit characters in there you're living dangerously. Only when octal or hex escapes appear in a sourcecode string can it be anything other than ascii. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.cwi.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4