[MarkH] > ... > I based my assesment simply on my perception of what is likely to > happen, not my opinion of what _should_ happen. I based mine on what Guido was waiting for someone to say <wink>. We worry too much about disagreeing here; different opinions are great! Guido will squash the ones he can't stand anyway. [about Aaron's pylint's lack of 1.5.2 smarts] > ... > Aaron agrees that a parser module based one would be better. You can't beat a real parse, no argument there. Luckily, the compiler parses too. [Guido] > What stands in the way? > > (a) There's so much else to do... How did Perl manage to attract 150 people with nothing to do except hack on Perl internals? "Wow, that code's such a mess I bet even *I* could get something into it" <0.6 wink>. > (b) *Someone* needs to design a good framework for spitting out > warnings, and for giving the programmer a way to control which > warnings to ignore. I've seen plenty of good suggestions here; now > somebody should simply go off and come up with a proposal too good to > refuse. The response has been ... absent. Anyone doing this? I liked JimF's push to make cmd-line options available to Python programs too. Somehow they seem related to me. > (c) I have a different agenda with CP4E -- I think it would be great > if the editor could do syntax checking and beyond *while you type*, > like the spell checker in MS Word. (I *like* Word's spell checker, > even though I hate the style checker [too stupid], and I gladly put up > with the spell checker's spurious complaints -- it's easy to turn off, > easy to ignore, and it finds lots of good stuff.) > > Because the editor has to deal with incomplete and sometimes > ungrammatical things, and because it has to work in real time (staying > out of the way when you're frantically typing, catching up when your > fingers take a rest), it needs a different kind of parser. Different from what? Python's own parser for sure. IDLE has at least two distinct parsers of its own that have nothing in common with Python's parser and little in common with each other. Using the horrid tricks in PyParse.py, it may even be possible to write the kind of parser you need in Python and still have it run fast enough. For parsing-on-the-fly from random positions, take my word for it and Barry's as insurance <wink>: the single most frequent question you need to have a fast and reliable answer for is "is this character in a string?". Unfortunately, turns out that's the hardest question to answer too. The next one is "am I on a continuation line, and if so where's the start?". Given rapid & bulletproof ways to answer those, the rest is pretty easy. > But that's another project, and once the Python core has a warning > framework in place, I'm sure we'll find more things that are worth > warning about. That was frequently predicted for various pylint projects too <wink>. > I'm not always in agreement with Tim Peters when he says that Python > is so dynamic that it's impossible to check for certain errors. It > may be impossible to say *for sure* that something is an error, but > there sure are lots of situations where you're doing something that's > *likely* to be an error. We have no disagreement there. What a compiler does must match the advertised semantics of the language-- or its own advertised deviations from those --without compromise. A warning system has no such constraint; to the contrary, in the case of a semantic mess like Perl, most of its value is in pointing out *legal* constructs that are unlikely to work the way you intended. > E.g. if the compiler sees len(1), and there's no local or global > variable named len, it *could* be the case that the user has set up a > parallel universe where the len() built-in accepts an integer > argument, but more likely it's an honest mistake, and I would like to > get a warning about it. Me too. More: I'd also like to get a warning for *having* a local or global variable named len! Masking the builtin names is simply bad practice, and is also usually an honest mistake. BTW, I was surprised that the most frequent gotcha among new Python users at Dragon turned out to be exactly that: dropping a "len" or a "str" or whatever (I believe len, str and list were most common) into their previously working code-- because they just learned about that builtin --and getting runtime errors as a result. That is, they already had a local var of that name, and forgot. Then they were irked that Python didn't nag them from the start (with a msg they understood, of course). > The hard part here is to issue this warning for len(x) where x is some > variable or expression that is likely to have a non-sequence value > (barring alternate universes); this might require global analysis > that's hard or expensive enough that we can't put it in the core > (yet). This may be seen as an argument for a separate lint... Curiously, Haskell is statically type-safe but doesn't require declarations of any kind -- it does global analysis, and has a 100% reliable type inference engine (the language was, of course, designed to make this true). Yet I don't think I've ever seen a Haskell program on the web that didn't explicitly declare the type of every global anyway. I think this is natural, too: while it's a PITA to declare the type of every stinking local that lives for two lines and then vanishes, the types of non-local names aren't self-evident: type decls really help for them. So if anyone is thinking of doing the kind of global analysis Guido mentions here, and is capable of doing it <wink>, I'd much rather they put their effort into optional static type decls for Python2. Many of the same questions need to be answered either way (like "what's a type?", and "how do we spell a type?" -- the global analysis warnings won't do any good if you can't communicate the substance of an error <wink>), and optional decls are likely to have bigger bang for the buck. [Skip Montanaro] > ... > Perl's experience with -w seems to suggest that it's best to always > enable whatever warnings you can as well. While that's my position, I don't want to oversell the Perl experience. That language allows so many goofy constructs, and does so many wild guesses at runtime, that Perl is flatly unusable without -w for non-trivial programs. Not true of Python, although the kinds of warnings people have suggested so far certainly do seem worthy of loud complaint by default. > (More and more I see people using gcc's -Wall flag as well.) If you have to write portable C++ code, and don't enable every warning you can get on every compiler you have, and don't also turn on "treat warnings as errors", non-portable code will sneak into the project rapidly. That's my experience, over & over. gcc catches stuff MS doesn't, and vice versa, and MetroWerks yet another blob, and platform-specific cruft *still* gets in. It's irksome. > Now, my return consistency stuff was easy enough to write in C for two > reasons. One, I'm fairly comfortable with the compile.c code. I don't anticipate dozens of people submitting new warning code. It would be unprecendented if even two of us decided this was our thing. If would be almost unprecendented if even one of us followed up on it <0.6 wink>. > Two, adding my checks required no extra memory management overhead. Really good global analysis likely requires again as much C code as already exists. Luckily, I don't think putting in some warnings requires that all conceivable warnings be implemented at once <wink>. For stuff that complex, I'd rather make it optional and write it in Python; I don't believe any law prevents the compiler from running Python code. > Consider a few other checks you might conceivably add to the byte code > compiler: > > * tab nanny stuff (already enabled with -t, right?) Very timidly, yes <wink>. Doesn't complain by default, and you need -tt to make it an error. Only catches 4 vs 8 tab size ambiguities, but that's good enough for almost everyone almost all the time. > * variables set but not used > * variables used before set These would be wonderful. The Perl/pylint "gripe about names unique in a module" is a cheap approximation that gets a surprising percentage of the benefit for the cost of a dict and an exception list. > If all of this sort of stuff is added to the compiler proper, I predict a > couple major problems will surface: > > * The complexity of the code will increase significantly, making it > harder to maintain and enhance The complexity of the existing code should be almost entirely unaffected, because non-trivial semantic analysis belongs in a new subsystem with its own code. > * Fewer and fewer people will be able to approach the code, making it > less likely that new checks are added As opposed to what? The status quo, with no checks at all? Somehow, facing the prospect of *some* checks doesn't frighten me away <wink>. Besides, I don't buy the premise: if someone takes this on as their project, worrying that they'll decline to add new valuable checks is like MarkH worrying that I wouldn't finish adding full support for stinking tabs to the common IDLE/PythonWin editing components. People take pride in their hackery. > * Future extensions like pluggable virtual machines will be harder > to add because their byte code compilers will be harder to integrate > into the core If you're picturing adding this stuff sprayed throughout the guts of the existing com_xxx routines, we've got different pictures in mind. Semantic analysis is usually a pass between parsing and code generation, transforming the parse tree and complaining about source it thinks is fishy. If done in any conventional way, it has no crosstalk at all with either the parsing work that precedes it or the code generation that follows it. It's a pipe stage between them, whose output is of the same type as its input. That is, it's a "pluggable component" in its own right, and doesn't even need to be run. So potential for interference just isn't there. At present, Python is very unusual both in: 1) Having no identifiable semantic pass at all, parsing directly to byte code, and enforcing its few semantic constraints (like "'continue' not properly in loop") lumped in with both of those. and 2) Having one trivial optimization pass-- 76 lines of code instead of the usual 76,000 <wink> --running after the byte code has been generated. However, the sole transformation made here (distinguishing local from non-local names) is much more properly viewed as being a part of semantic analysis than as being "an optimization". It's deducing trivial info about what names *mean* (i.e., basic semantics), and is called optimization here only because Python didn't do it at first. So relating this to a traditional compiler, I'd say that "optimize()" is truly Python's semantic analysis pass, and all that comes before it is the parsing pass -- a parsing pass with output in a form that's unfortunately clumsy for further semantic analysis, but so it goes. The byte code is such a direct reflection of the parse tree that there's really little fundamental difference between them. So for minimal disruption, I'd move "optimize" into a new module and call it the semantic analysis pass, and it would work with the byte code. Just as now, you wouldn't *need* to call it at all. Unlike now, the parsing pass probably needs to save away some more info (e.g., I don't *think* it keeps track of what all is in a module now in any usable way). For Python2, I hope Guido adopts a more traditional structure (i.e., parsing produces a parse tree, codegen produces bytecode from a parse tree, and other tree->tree transformers can be plugged in between them). Almost all compilers follow this structure, and not because compiler writers are unimaginative droids <wink>. Compile-time for Python isn't anywhere near being a problem now, even on my creaky old 166MHz machine; I suspect the current structure reflects worry about that on much older & slower machines. Some of the most useful Perl msgs need to be in com_xxx, though, or even earlier. The most glaring example is runaway triple-quoted strings. Python's "invalid token" pointing at the end of the file is maddeningly unhelpful; Perl says it looks like you have a runaway string, and gives the line number it thinks it may have started on. That guess is usually correct, or points you to what you *thought* was the end of a different string. Either way your recovery work is slashed. (Of course IDLE is even better: the whole tail of the file changes to "string color", and you just need to look up until the color changes!) > In addition, more global checks probably won't be possible (reasoning about > code across module boundaries for instance) because the compiler's view of > the world is fairly narrow. As above, I don't think there's enough now even to analyze one module in isolation. > I think lint-like tools should be implemented in Python (possibly with the > support of an extension module for performance-critical sections) which is > then called from the compiler proper under appropriate circumstances > (warnings enabled, all necessary modules importable, etc). I have no objection to that. I do object to the barely conceivable getting in the way of the plainly achievable, though -- the pylint tools out there now, just like your return consistency checker, do a real service already without any global analysis. Moving that much into the core (implemented in Python if possible, recoded in C if not) gets a large chunk of the potential benefit for maybe 5% of the eventual work. It's nice that Guido is suddenly keen on global analysis too, but I don't see him volunteering to do any work either <wink>. > I believe the code would be much more easily maintained and extended. If it's written in Python, of course. > You'd be able to swap in a new byte code compiler without risking the > loss of your checking code. I never understood this one; even if there *were* a competing byte code compiler out there <0.1 wink>, keeping as much as possible out of com_xxx should render it a non-issue. If I'm missing your point and there is some fundamental conflict here, fine, then it's another basis on which bytecode compilers will compete. more-concerned-about-things-that-exist-than-things-that-don't-ly y'rs - tim
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4