[/F, upon the reinvention of substring descriptors] > ... > a) bad memory behaviour if you slice small strings out > of huge input strings -- which may surprise newbies. Experts too. Dragon has gobs of code that copies little strings via loops in Java and C++, because Java's and MFC's descriptor-based string classes routinely keep a megabyte string alive after you've sliced out the 3 bytes <0.5 wink> you needed. Last year my group finally wrote its own string classes, to just copy the damn things. Performance improvement was significant (both space & time). Boehm's "cords"/"ropes" (he's the primary author of both pkgs JC mentioned) were specifically designed to support efficient random & repeated editing of giant mutable strings -- agree with Guido that it's overall major loss for pedestrian uses. Heck, why not implement strings as giant B-trees like the Tcl text widget does <wink>. > b) harder to interface to underlying C libraries -- the > current string implementation guarantees that a Python > string is also a C string (with a trailing null). c) For apps that use oodles of short strings, the space overhead of maintaining descriptors exceeds that of making copies. A buddy in Sun's Java development group tells me Java is despised for this by Major Players in the DB world; so don't be surprised if Java eventually drops the descriptor idea too (or, more Java-like, introduces 5 new flavors of strings <0.7 wink>). So there's no pure win here. Python's current scheme is at least predictable, and by everyone, with finite effort. Agree you have a particular good but limited use it for it, though, and Greg's suggestion of using buffer objects under the covers is almost certainly "the right" idea.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4