This is a multi-part message in MIME format. --Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ) Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT [Guido] >>> But if I had to do it over again, I wouldn't have added walk() in the >>> current form. [Neil Schemenauer] >> I think it's the perfect place for a generator. [Guido] > Absolutely! So let's try to write something new based on generators, > make it flexible enough so that it can handle pre-order or post-order > visits, and then phase out os.walk(). I posted one last night, with a bug (it failed to pass the topdown flag through to recursive calls). Here's that again, with the bug repaired, sped up some, and with a docstring. Double duty: the example in the docstring shows why we don't want to make a special case out of sum([]): empty lists can arise naturally. What else would people like in this? I really like separating the directory names from the plain-file names, so don't bother griping about that <wink>. It's at least as fast as the current os.path.walk() (it's generally faster for me, but times for this are extremely variable on Win98). Removing the internal recursion doesn't appear to make a measureable difference when walking my Python tree, although because recursive generators require time proportional to the current stack depth to deliver a result to the caller, and to resume again, removing recursion could be much more efficient on an extremely deep tree. The biggest speedup I could find on Windows was via using os.chdir() liberally, so that os.path.join() calls weren't needed, and os.path.isdir() calls worked directly on one-component names. I suspect this has to do with that Win98 doesn't have an effective way to cache directory lookups under the covers. Even so, it only amounted to a 10% speedup: directory walking is plain slow on Win98 no matter how you do it. The attached doesn't play any gross speed tricks. --Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ) Content-type: text/plain; name=walk.py Content-transfer-encoding: 7BIT Content-disposition: attachment; filename=walk.py def walk(top, topdown=True): """Directory tree generator. For each directory in the directory tree rooted at top (including top itself, but excluding '.' and '..'), yields a 3-tuple dirpath, dirnames, filenames dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. If optional arg 'topdown' is true or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top down). If topdown is false, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom up). When topdown is true, the caller can modify the dirnames list in-place (e.g. via del or slice assignment), and walk will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, or to impose a specific order of visiting. Modifying dirnames when topdown is false is ineffective, since the directories in dirnames have already been generated by the time dirnames itself is generated. Caution: if you pass a relative pathname for top, don't change the current working directory between resumptions of walk. Example: from os.path import join, getsize for root, dirs, files in walk('python/Lib/email'): print root, "consumes", print sum([getsize(join(root, name)) for name in files]), print "bytes in", len(files), "non-directory files" if 'CVS' in dirs: dirs.remove('CVS') # don't visit CVS directories """ import os from os.path import join, isdir try: names = os.listdir(top) except os.error: return exceptions = ('.', '..') dirs, nondirs = [], [] for name in names: if name not in exceptions: if isdir(join(top, name)): dirs.append(name) else: nondirs.append(name) if topdown: yield top, dirs, nondirs for name in dirs: for x in walk(join(top, name), topdown): yield x if not topdown: yield top, dirs, nondirs --Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ)--
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4