[Tony Meyer] > ... > (Glad you posted this - I was wading through the progress of > marshalling (PyOS_snprintf etc) and getting rapidly lost). It's the unmarshalling code that's relevant -- that just passes a string to atof(). >> 1. When LC_NUMERIC is "german", MS C's atof() stops at the first >> period it sees. > This is the case: > """ > #include <locale.h> > #include <stdio.h> > #include <stdlib.h> > > int main() > { > float f; > setlocale(LC_NUMERIC, "german"); > f = atof("0.1"); > printf("%f\n", f); > } > """ > > Gives me with gcc version 3.2 20020927 (prerelease): > 0.100000 It's possible that glibc doesn't recognize "german" as a legitimate locale name (so that the setlocale() call had no effect). > Gives me with Microsoft C++ Builder (I don't have Visual C++ handy, > but I suppose it would be the same): > 0,00000 > > The help file for Builder does say that this is the correct behaviour > - it will stop when it finds an unrecognised character - here '.' is > unrecognised (because we are in German), so it stops. atof does have to stop at the first unrecognized character, but atof is locale-dependent, so which characters are and aren't recognized depends on the locale. After I set locale to "german" on Win2K: >>> import locale >>> locale.setlocale(locale.LC_NUMERIC, "german") 'German_Germany.1252' MS tells me that the decimal_point character is ',' and the thousands_sep character is '.': >>> import pprint >>> pprint.pprint(locale.localeconv()) {'currency_symbol': '', 'decimal_point': ',', HERE 'frac_digits': 127, 'grouping': [3, 0], 'int_curr_symbol': '', 'int_frac_digits': 127, 'mon_decimal_point': '', 'mon_grouping': [], 'mon_thousands_sep': '', 'n_cs_precedes': 127, 'n_sep_by_space': 127, 'n_sign_posn': 127, 'negative_sign': '', 'p_cs_precedes': 127, 'p_sep_by_space': 127, 'p_sign_posn': 127, 'positive_sign': '', 'thousands_sep': '.'} AND HERE >>> Python believes that the locale-specified thousands_sep character should be ignored, and that's what locale.atof() does. It may well be a bug in MS's atof() that it doesn't ignore the current thousands_sep character -- I don't have time now to look up the rules in the C standard, and it doesn't matter to spambayes either way (whether we load .001 as 0.0 as 1.0 is a disaster either way). > Does this then mean that this is a Python bug? That Microsoft's atof() doesn't ignore the thousands_sep character is certainly not Pyton's bug <wink>. > Or because Python tells us not to change the c locale and we (Outlook) > are, it's our fault/problem? The way we're using Python with Outlook doesn't meet the documented requirements for using Python, so for now everything that goes wrong here is our problem. It would be better if Python didn't use locale-dependent string<->float conversions internally, but that's just not the case (yet). > Presumably what we'll have to do for a solution is just what Mark is > doing now - find the correct place to put a call that (re)sets the c > locale to English. Python requires that the (true -- from the C library's POV) LC_NUMERIC category be "C" locale. That isn't English (although it looks a lot like it to Germans <wink>), and we don't care about any category other than LC_NUMERIC here.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4