Fredrik Lundh wrote: > > mal wrote: > > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > -1. method bloat. not exactly something you do every day, and > when you do, it's a one-liner: > > def tokenize(string, ignore): > [word for word in re.findall("\w+", string) if not word in ignore] This is not the same as what .tokenize() does: it cut at each occurrance of a substring rather than words as in your example (although I must say that list comprehension looks cool ;-). > > PS: Haven't gotten any response regarding the .decode() method yet... > > should I take this as "no objections" ? > > -0. method bloat. we don't have asfloat methods on integers and > asint methods on strings either... Well, we already have .encode() which interfaces to PyString_Encode(), but no Python API for getting at PyString_Decode(). This is what .decode() is for. Depending on the codecs you use, these two methods can be very useful, e.g. for "fixing" line-endings or hexifying strings. The codec concept can be used for far more applications than just converting from and to Unicode. About rich method APIs in general: I like having rich method APIs, since they make life easier (you don't have to reinvent the wheel everytime you want a common job to be done). IMHO, too many methods can never hurt, but I'm probably alone with that POV. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4