[Stephen J. Turnbull] > [...] I feel that it is possible to support the users who want > to use national encodings AND define the language in terms of a single > coded character set, as long as that set is Unicode. The usual > considerations of file system safety and standard C library > compatibility dictate that the transformation format be UTF-8. (Below > I will just write "UTF-8" as is commonly done.) > > My belief is that the proposal below has the same effect on most users > most of the time as PEP 263, while not committing Python to indefinite > support of a subsystem that will certainly be obsolete for new code in > 5 years, and most likely within 2 (at least for people using open > source and major vendor tools, I don't know what legacy editors people > may be using on "big iron" and whatnot). If your concern is that PEP 263 will bind us to indefinite support of the encoding cookie feature, I propose to add a "sunset provision" to the PEP, just as is commonly done to U.S. laws so that they expire after a certain date. I think it's a good idea to consider your hook proposal as an implementation strategy for the PEP, but I believe it would be wise if this were adopted as a standard feature rather than something users need to configure explicitly. You bring up one important point that AFAIK isn't addressed by the PEP: when text is presented to the parser in the form of an 8-bit string object, should an encoding cookie be honored if present? I'd say yes. When a Unicode string is presented, encoding cookies should be ignored, of course. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4