Guido van Rossum wrote: > > > Sigh. In our company we use 'german' as our master language so > > we have string literals containing iso-8859-1 umlauts all over the place. > > Okay as long as we don't mix them with Unicode objects, this doesn't > > hurt anybody. > > > > What I would love to see, would be a well defined way to tell the > > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > > when dealing with string literals from our modules. > > It would be better if this was supported for u"..." literals, so that > it was taken care of at the source code level completely. The running > program shouldn't have to worry about what encoding its source code > was! u"..." currently interprets the characters it finds as Latin-1 (this is by design, since the first 256 Unicode ordinals map to the Latin-1 characters). > For 8-bit literals, this would mean that if you had source code using > Latin-1, the literals would be translated from Latin-1 to UTF-8 by the > code generator. This would mean that len('รง') would return 2. I'm > not sure this is a great idea -- but then I'm not sure that using > Latin-1 in source code is a great idea either. > > > The tokenizer in Python 1.6 already contains smart logic to get the > > size of TABs right (pasting from tokenizer.c): ... > > Before we go any further we should design pragmas. The current > approach is inefficient and only designed to accommodate > editor-specific magical commands. > > I say it's a Python 1.7 issue. Good idea :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4