Glenn Linderman writes: > We can either artificially constrain ourselves to minor tweaks of > the legal conforming bytestreams, It's not artificial. Having the internal representation be the same as a standard encoding is very useful for a large number of minor usages (urgently saving buffers in a text editor that knows its internal state is inconsistent, viewing strings in the debugger, PEP 393-style space optimization is simpler if text properties are out-of-band, etc). > or we can invent a representation (whether called str or something > else) that is useful and efficient in practice. Bring on the practice, then. You say that a bit to identify lone surrogates might be useful or efficient. In what application? How much time or space does it save? You say that a bit to cache a property might be useful or efficient. In what application? Which properties? Are those properties a set fixed by the language, or would some bits be available for application-specific property caching? How much time or space does that save? What are the costs to applications that don't want the cache? How is the bit-cache affected by PEP 393? I know of no answers (none!) to those questions that favor introduction of a bit-cache representation now. And those bits aren't going anywhere; it will always be possible to use a "wide" build and change the representation later, if the optimization is valuable enough. Now, I'm aware that my experience is limited to the implementations of one general-purpose language (Emacs Lisp) of retricted applicability. But its primary use *is* in text processing, so I'm moderately expert. *Moderately*. Always interested in learning more, though. If you know of relevant use cases, I'm listening! Even if Guido doesn't find them convincing for Python, we might find them interesting at XEmacs.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4