On Tue, 02 May 2000 08:31:55 -0400, Guido van Rossum <guido@python.org> wrote: >> No automatic conversions between 8-bit "strings" and Unicode = strings. >>=20 >> If you want to turn UTF-8 into a Unicode string, say so. >> If you want to turn Latin-1 into a Unicode string, say so. >> If you want to turn ISO-2022-JP into a Unicode string, say so. >> Adding a Unicode string and an 8-bit "string" gives an exception. > >I'd accept this, with one change: mixing Unicode and 8-bit strings is >okay when the 8-bit strings contain only ASCII (byte values 0 through >127). That does the right thing when the program is combining >ASCII data (e.g. literals or data files) with Unicode and warns you >when you are using characters for which the encoding matters. I >believe that this is important because much existing code dealing with >strings can in fact deal with Unicode just fine under these >assumptions. (E.g. I needed only 4 changes to htmllib/sgmllib to make >it deal with Unicode strings -- those changes were all getattr() and >setattr() calls.) > >When *comparing* 8-bit and Unicode strings, the presence of non-ASCII >bytes in either should make the comparison fail; when ordering is >important, we can make an arbitrary choice e.g. "\377" < u"\200". I assume 'fail' means 'non-equal', rather than 'raises an exception'? Toby Dickenson tdickenson@geminidataloggers.com
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4