Greg Stein wrote: > > On Mon, 15 Nov 1999, M.-A. Lemburg wrote: > > Guido van Rossum wrote: > >... > > > t# refers to byte-encoded data. Multibyte encodings are explicitly > > > designed to be passed cleanly through processing steps that handle > > > single-byte character data, as long as they are 8-bit clean and don't > > > do too much processing. > > > > Ah, ok. I interpreted 8-bit to mean: 8 bits in length, not > > "8-bit clean" as you obviously did. > > Hrm. That might be dangerous. Many of the functions that use "t#" assume > that each character is 8-bits long. i.e. the returned length == the number > of characters. > > I'm not sure what the implications would be if you interpret the semantics > of "t#" as multi-byte characters. FYI, the next version of the proposal now says "s#" gives you UTF-16 and "t#" returns UTF-8. File objects opened in text mode will use "t#" and binary ones use "s#". I'll just use explicit u.encode('utf-8') calls if I want to write UTF-8 to binary files -- perhaps everyone else should too ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 45 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4