Steven Bethard wrote: > Delaney, Timothy C (Timothy) <tdelaney at avaya.com> wrote: > >>The perennial "how do I remove duplicates from a list" topic came up on >>c.l.py and in the discussion I mentioned the java 1.5 LinkedHashSet and >>LinkedHashMap. I'd thought about proposing these before, but couldn't >>think of where to put them. It was pointed out that the obvious place >>would be the collections module. >> >>For those who don't know, LinkedHashSet and LinkedHashMap are simply >>hashed sets and maps that iterate in the order that the keys were added >>to the set/map. I almost invariably use them for the above scenario - >>removing duplicates without changing order. >> >>Does anyone else think it would be worthwhile adding these to >>collections, or should I just make a cookbook entry? > > > I guess I'm -0 on this. > > Though I was the one that suggested that collections is the right > place to put them, I'm not really certain how much we gain by > including them. I too would only ever use them for removing > duplicates from a list. But if we're trying to provide a solution to > this problem, I'd rather see an iterable-friendly one. See a previous > thread on this issue[1] where I suggest something like: > > def filterdups(iterable): > seen = set() > for item in iterable: > if item not in seen: > seen.add(item) > yield item > > Adding this to, say, itertools would cover all my use cases. And as > long as you don't have too many duplicates, filterdups as above should > keep memory consumption down better. > I am -1 on adding LinkedHash*. While I can understand wanting to get rid of duplicates easily and wanting a good solutoin, Steven's snippet of code shows rolling your own solution is easy. Plus this can even be simplified down to a one-liner using itertools already:: itertools.ifilterfalse(lambda item, _set=set(): (item in _set) or (_set.add(item) and False), iterable) I don't think it is the prettiest solution, but it does show that coming up with one is not hard nor restricted to only a single solution that requires knowing some Python idiom (well, mine does for the default arg to the lambda, but Steven's doesn't). The last thing I want to happen is for Python's stdlib to grow every possible data structure out there like Java seems to have. I don't ever want to have to think about what *variant* of a data structure I should use, only what *type* of data structure (and no, I don't count collections.deque and Queue.Queue an overlap since the latter is meant more for thread communication, at least in my mind). -Brett
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4