[Noah Spurrier] > I like your version; although I used a different name to avoid > confusion with os.path.walk. Who's confused <wink>? I agree it needs some other name if something like this gets checked in. > Note that os.listdir does not include the special entries '.' and '..' > even if they are present in the directory, so there is no need > to remove them. Oops -- that's right! This is a code divergence problem. There's more than one implementation of os.path.walk in the core, and the version in ntpath.py (which I started from) still special-cases '.' and '..'. I don't think it needs to. > Tim Peters> for path in walk(root): > Tim Peters>Or it could look like > Tim Peters> for top, names in walk(root): > Tim Peters>or > Tim Peters> for top, dirnames, nondirnames in walk(root): > > I like the idea of yielding (top, dirs, nondirs), but > often I want to perform the same operations on both > dirs and nondirs so separating them doesn't help that case. I think (a) that's unusual, and (b) it doesn't hurt that case either. You can do, e.g., for root, dirs, files in walk(...): for name in dirs + files: to squash them together again. > This seems to be a situation where there is no typical case, > so my preference is for the simpler interface. > It also eliminates the need to build two new lists from > the list you get from os.listdir()... Sorry, I'm unmovable on this point. My typical uses for this function do have to separates dirs from non-dirs, walk() has to make the distinction *anyway* (for its internal use), and it's expensive for the client to do the join() and isdir() bits all over again (isdir() is a filesystem op, and at least on my box repeated isdir() is overwhelmingly more costly than partitioning or joining a Python list). > In fact, I prefer your first suggestion (for path in walk(root)), but > that would require building a new list by prepending the > basepath to each element of children because os.listdir does not > return full path. What about that worries you? I don't like it because I have some directories with many thousands of files, and stuffing a long redundant path at the start of each is wasteful in the abstract. I'm not sure it really matters, though -- e.g., 10K files in a directory * 200 redundant chars each = a measly 2 megabytes wasted <wink>. > So finally in this example, I just went with returning the basepath > and the children (both files and directories). > > Following Tom Good's example I added an option to > ignore symbolic links. Not all Python platforms have symlinks, of course. The traditional answer to this one was that if a user wanted to avoid chasing those on a platform that supports them, they should prune the symlink names out of the fnames list passed to walk's func callback. The same kind of trick is still available in the generator version, although it was and remains painful. Separating the dirs from the non-dirs for the caller at least reduces the expense of it. > It would be better to detect cycles or at least prevent going > higher up in the tree. > ... > This example still allows you to prune the search > in Breadth first mode by removing elements from > the children list. That is cool. > for top, children in walk('C:/code/python', depthfirst=False): > print top, children > if 'CVS' in children: > children.remove('CVS') I'm finding you too hard to follow here, becuase your use of "depthfirst" and "breadthfirst" doesn't follow normal usage of the terms. Here's normal usage: consider this tree (A's kids are B, C, D; B's kids are E, F; C's are G, H, I; D's are J, K): A B C D E F G H I J K A depth-first left-to-right traversal is what you get out of a natural recursive routine. It sees the nodes internally in this order: A B E F C G H I D J K In a preorder DFS (depth first search), you deliver ("do something with" -- print it, yield it, whatever) the node before delivering its children. Preorder DFS in the tree above delivers the nodes in order A B E F C G H I D J K which is the same order in which nodes are first seen. This is what I called "top down". In a postorder DFS, you deliver the node *after* delivering its children, although you still first see nodes in the same order. Postorder left-to-right DFS in the tree above delivers nodes in this order: E F B G H I C J K D A This is what I called "bottom up". A breadth-first search can't be done naturally using recursion; you need to maintain an explicit queue for that (or write convoluted recursive code). A BFS on the tree above would see the nodes in this order: A B C D E F G H I J K It can be programmed like so, given a suitable queue implementation: queue = SuitableQueueImplementation() queue.enqueue(root) while queue: node = queue.dequeue() for child in node.children(): queue.enqueue(child) Nobody has written a breadth-first traverser in this thread. If someone wants to, there are again preorder and postorder variations, although only preorder BFS falls naturally out of the code just above. The current os.path.walk() delivers directories in preorder depth-first left-to-right order, BTW. > for name in children: > fullpath = os.path.join(basepath, name) > if os.path.isdir (fullpath) and not (ignorelinks and > os.path.islink(fullpath)): Despte what I said above <wink>, I expect the ignorelinks argument is a good idea. > for (next_basepath, next_children) in walktree > (fullpath, depthfirst, ignorelinks): > yield next_basepath, next_children Note that there's no need to pull apart 2-tuples and paste them together again here; for x in walktree(...): yield x does the same thing.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4