> a. Does this really make sense for UTF-16? It looks to me like a > great way to induce bugs of the form "write a unicode literal > containing 0x0A, then translate it to raw form by stripping the u > prefix." Of course not. I don't expect anyone to put UTF-16 in their source encoding cookie. But should we bother making a list of encodings that shouldn't be used? > b. No editor is likely to implement correct display to distinguish > between u"" and just "". That's fine. Given phase 2, the editor should display the entire file using the encoding given in the cookie, despite that phase 1 only applies the encoding to u"" literals. The rest of the file is supposed to be ASCII, and if it isn't, that's the user's problem. > c. This definitely breaks Emacs coding cookie semantics. Emacs > applies the coding cookie to the whole buffer. I don't see a way to > lose offhand, but this is sufficiently subtle that I don't want to > break my head trying to prove that you can't lose, either. I wouldn't worry about that, see above. > d. You probably have to deprecate ISO 2022 7-bit coding systems, too, > because people will try to get the representation of a string by > inputting a raw string in coded form. This might contain a quote > character. Good point. This sounds like a documentation issue at worst. > e. This causes problems for UTF-8 transition, since people will want > to put arbitrary byte strings in a raw string. I'm not sure I understand. What do you call a raw string? Do you mean an r"" literal? Why would people want to use that for arbitrary binary data? Arbitrary binary data should *always* be encoded using \xDD hex or \OOO octal escapes. > But these will not be > legal UTF-8 files, even though they have a UTF-8 coding cookie. > People who are trying to do the right thing will have the rules > changed again later, most likely. If you're trying to do the right thing you shouldn't be putting arbitrary binary data in any string literal. > This means that until editors reliably implement b. and similar > features, developers must change coding systems to type raw strings > and Unicode strings. Sounds like a YAGNI to me. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4