On Tue, Jul 02, 2002 at 03:31:03PM -0400, Tim Peters wrote: > I would have guessed you had a more vivid imagination <wink>. It's > precisely because the id has been guaranteed that a program may not care to > save a reference to an interned string. For example, > > """ > _ids = map(id, map(intern, "if then elif else".split())) > TOKEN_IF, TOKEN_THEN, TOKEN_ELIF, TOKEN_ELSE, TOKEN_NAME = range(5) > id2token = dict(zip(_ids, range(4))) > del _ids > > def tokenvector(s): > return [id2token.get(id(intern(word)), TOKEN_NAME) > for word in s.split()] > > print tokenvector("if this is the example, then what's the question?") > """ > > This works reliably today to classify tokens. I'm not certain I'd care if > it broke, but we have to consider that it hasn't been difficult to write > code that would break. Ironically, this code is actually slower than using the strings themselves as keys (interned or not). But I get the point. > > Now for something a bit more radical: > > > > Why not make interned strings a type? <type 'istr'> could be an > > un-subclassable subclass of string. intern would just be an > > alias for this type. No two istr instances are equal unless they are > > identical. I guess PyString_CheckExact would need to be changed to > > accept either String or InternedString. > > What would the point be? That is, instead of "why not?", why? As to "why > not?", there's something about elevating what's basically an optimization > hack to a type that makes me squirm. Change the name from 'istr' to 'symbol' and add a mild case of language envy and you'll see why ;-) Oren
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4