<div class="gmail_quote"><div class="im">On Fri, Feb 10, 2012 at 1:05 PM, Brett Cannon <span dir="ltr"><<a href="mailto:brett@python.org" target="_blank">brett@python.org</a>></span> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br><br><div class="gmail_quote"><div><div class="im">On Thu, Feb 9, 2012 at 17:00, PJ Eby <span dir="ltr"><<a href="mailto:pje@telecommunity.com" target="_blank">pje@telecommunity.com</a>></span> wrote:<br></div><div class="im">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div>I did some crude timeit tests on frozenset(listdir()) and trapping failed stat calls. Â It looks like, for a Windows directory the size of the 2.7 stdlib, you need about four *failed* import attempts to overcome the initial caching cost, or about 8 successful bytecode imports. Â (For Linux, you might need to double these numbers; my tests showed a different ratio there, perhaps due to the Linux stdib I tested having nearly twice as many directory entries as the directory I tested on Windows!)</div>
</div></blockquote></div></div><div class="im"><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote">
<div><br></div><div>However, the numbers are much better for application directories than for the stdlib, since they are located earlier on sys.path. Â Every successful stdlib import in an application is equal to one failed import attempt for every preceding directory on sys.path, so as long as the average directory on sys.path isn't vastly larger than the stdlib, and the average application imports at least four modules from the stdlib (on Windows, or 8 on Linux), there would be a net performance gain for the application as a whole. Â (That is, there'd be an improved per-sys.path entry import time for stdlib modules, even if not for any application modules.)</div>
</div></blockquote><div><br></div></div><div>Does this comment take into account the number of modules required to load the interpreter to begin with? That's already like 48 modules loaded by Python 3.2 as it is.</div>
</div></div></blockquote><div><br></div><div>I didn't count those, no. Â So, if they're loaded from disk *after* importlib is initialized, then they should pay off the cost of caching even fairly large directories that appear earlier on sys.path than the stdlib. Â We still need to know about NFS and other ratios, though... Â I still worry that people with more extreme directory sizes or slow-access situations will run into even worse trouble than they have now.</div>
</div></blockquote><div><br></div><div>It's possible. No way to make it work for everyone. This is why I didn't worry about some crazy perf optimization.</div><div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div class="im">
</div></blockquote><div><br></div></div><div>Wow. Â That means it'd always be a win for pre-stdlib sys.path entries, because any successful stdlib import equals a failed pre-stdlib lookup. Â (Of course, that's just saving some of the overhead that's been *added* by importlib, not a new gain, but still...)</div>
</div></blockquote><div><br></div><div>How so? import.c does a listdir() as well (this is not special to importlib).</div><div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div class="im">
</div></blockquote><div><br></div></div><div>Not quite sure what you mean here. Â The directory stat is used to ensure that new files haven't been added, old ones removed, or existing ones renamed. Â Changes to the files themselves shouldn't factor in, should they?</div>
</div></blockquote><div><br></div><div>Changes in any fashion to the directory. Do filesystems atomically update the mtime of a directory when they commit a change? Otherwise we have a potential race condition.</div><div>
 </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div class="im">
</div></blockquote><div><br></div></div><div>Again, I'm not sure how this relates. Â Automatic code reloaders monitor individual files that have been previously imported, so the directory timestamps aren't relevant.</div>
<div><br></div></div></blockquote><div><br></div><div>Don't care about automatic reloaders. I'm just asking about the case where the mtime granularity is coarse enough to allow for a directory change, an import to execute, and then another directory change to occur all within a single mtime increment. That would lead to the set cache to be out of date.</div>
<div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div></div><div>Of course, I could be confused here. Â Are you saying that if somebody makes a new .py file and saves it, that it'll be possible to import it before it's finished being written? Â If so, that could happen already, and again caching the directory doesn't make any difference.</div>
<div><br></div><div>Alternately, you could have a situation where the file is deleted after we load the listdir(), but in that case the open will fail and we can fall back... Â heck, we can even force resetting the cache in that event.</div>
<div class="im">
<div><br></div><div>Having said all of this, implementing this idea would be trivial using importlib if you don't try to optimize the __pycache__ case. It's just a question of whether people are comfortable with the semantic change to import. This could also be made into something that was in importlib for people to use when desired if we are too worried about semantic changes.</div>
</div>
</blockquote><div><br></div><div>You can do that if you want, obviously I don't want to bother since it won't make it into Python 2.7. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>Â </div></blockquote></div>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4