> Sigh. In our company we use 'german' as our master language so > we have string literals containing iso-8859-1 umlauts all over the place. > Okay as long as we don't mix them with Unicode objects, this doesn't > hurt anybody. > > What I would love to see, would be a well defined way to tell the > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > when dealing with string literals from our modules. It would be better if this was supported for u"..." literals, so that it was taken care of at the source code level completely. The running program shouldn't have to worry about what encoding its source code was! For 8-bit literals, this would mean that if you had source code using Latin-1, the literals would be translated from Latin-1 to UTF-8 by the code generator. This would mean that len('รง') would return 2. I'm not sure this is a great idea -- but then I'm not sure that using Latin-1 in source code is a great idea either. > The tokenizer in Python 1.6 already contains smart logic to get the > size of TABs right (pasting from tokenizer.c): > > /* Skip comment, while looking for tab-setting magic */ > if (c == '#') { > static char *tabforms[] = { > "tab-width:", /* Emacs */ > ":tabstop=", /* vim, full form */ > ":ts=", /* vim, abbreviated form */ > "set tabsize=", /* will vi never die? */ > /* more templates can be added here to support other editors */ > }; > .. > > It wouldn't be to hard to add something there to recognize > other "pragma" comments like for example: > #content-transfer-encoding: iso-8859-1 > But what to do with it? May be adding a default encoding to every string > object? Is this bloat? Just an idea. Before we go any further we should design pragmas. The current approach is inefficient and only designed to accommodate editor-specific magical commands. I say it's a Python 1.7 issue. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4