mal wrote: > > > "one, two and three".tokenize([",", "and"]) > > > -> ["one", " two ", "three"] > > > > > > I like this method -- should I review the code and then check it in ? > > > > -1. method bloat. not exactly something you do every day, and > > when you do, it's a one-liner: > > > > def tokenize(string, ignore): > > [word for word in re.findall("\w+", string) if not word in ignore] > > This is not the same as what .tokenize() does: it cut at each > occurrance of a substring rather than words as in your example oh, I didn't see the spaces. splitting on all substrings is even easier (but perhaps a bit more obscure, at least when written on one line): def tokenize(string, seps): return re.split("|".join(map(re.escape, seps)), string) Cheers /F
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4