While researching http://bugs.python.org/issue7327, I've come to the conclusion that trunk handles locales incorrectly in regards to Unicode. Fixing this would be the first step toward resolving this issue with float and Decimal locale-aware formatting. The issue concerns the locale "cs_CZ.UTF-8", and the "thousands_sep" value (among others). The C struct lconv (in Linux) contains '\xc2\xa0' for thousands_sep. In py3k this is handled by calling mbstowcs (which is locale-aware) and then PyUnicode_FromWideChar, so the value is converted to u"\xa0" (non-breaking space). But in trunk, the value is just used as-is. So when formating a decimal, for example, '\xc2\xa0' is just inserted into the result, such as: >>> format(Decimal('1000'), 'n') '1\xc2\xa0000' This doesn't make much sense, and causes an error when converting it to unicode: >>> format(Decimal('1000'), u'n') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/root/python/trunk/Lib/decimal.py", line 3609, in __format__ return _format_number(self._sign, intpart, fracpart, exp, spec) File "/root/python/trunk/Lib/decimal.py", line 5704, in _format_number return _format_align(sign, intpart+fracpart, spec) File "/root/python/trunk/Lib/decimal.py", line 5595, in _format_align result = unicode(result) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128) I believe that the correct solution is to do what py3k does in locale, which is to convert the struct lconv values to unicode. But since this would be a disruptive change if universally applied, I'd like to propose that we only convert to unicode if the values won't fit into a str. So the algorithm would be something like: 1. call mbstowcs 2. if every value in the result is in the range [32, 126], return a str 3. otherwise, return a unicode This would mean that for most locales, the current behavior in trunk wouldn't change: the locale.localeconv() values would continue to be str. Only for those locales where the values wouldn't fit into a str would unicode be returned. Does this seem like an acceptable change? Eric. PS: Thanks to Mark Dickinson and others on irc and on the issue for helping in formulating this.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4