[Moshe Zadka] > ... > I'm starting to wonder what the tests really test: the language > definition, or accidents of the implementation? You'd be amazed (appalled?) at how hard it is to separate them. In two previous lives as a Big Iron compiler hacker, we routinely had to get our compilers validated by a govt agency before any US govt account would be allowed to buy our stuff; e.g., http://www.itl.nist.gov/div897/ctg/vpl/language.htm This usually *started* as a two-day process, flying the inspector to our headquarters, taking perhaps 2 minutes of machine time to run the test suite, then sitting around that day and into the next arguing about whether the "failures" were due to non-standard assumptions in the tests, or compiler bugs. It was almost always the former, but sometimes that didn't get fully resolved for months (if the inspector was being particularly troublesome, it could require getting an Official Interpretation from the relevant stds body -- not swift!). (BTW, this is one reason huge customers are often very reluctant to move to a new release: the validation process can be very expensive and drag on for months) >>> def f(): ... global g ... g += 1 ... return g ... >>> g = 0 >>> d = {f(): f()} >>> d {2: 1} >>> The Python Lang Ref doesn't really say whether {2: 1} or {1: 2} "should be" the result, nor does it say it's implementation-defined. If you *asked* Guido what he thought it should do, he'd probably say {1: 2} (not much of a guess: I asked him in the past, and that's what he did say <wink>). Something "like that" can show up in the test suite, but buried under layers of obfuscating accidents. Nobody is likely to realize it in the absence of a failure motivating people to search for it. Which is a trap: sometimes ours was the only compiler (of dozens and dozens) that had *ever* "failed" a particular test. This was most often the case at Cray Research, which had bizarre (but exceedingly fast -- which is what Cray's customers valued most) floating-point arithmetic. I recall one test in particular that failed because Cray's was the only box on earth that set I to 1 in INTEGER I I = 6.0/3.0 Fortran doesn't define that the result must be 2. But-- you guessed it --neither does Python. Cute: at KSR, INT(6.0/3.0) did return 2 -- but INT(98./49.) did not <wink>. then-again-the-python-test-suite-is-still-shallow-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4