"Martin v. Löwis", 24.01.2011 21:17: > I have been thinking about Unicode representation for some time now. > This was triggered, on the one hand, by discussions with Glyph Lefkowitz > (who complained that his server app consumes too much memory), and Carl > Friedrich Bolz (who profiled Python applications to determine that > Unicode strings are among the top consumers of memory in Python). > On the other hand, this was triggered by the discussion on supporting > surrogates in the library better. > > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users). > > You'll find the PEP at > > http://www.python.org/dev/peps/pep-0393/ After much discussion, I'm +1 for this PEP. Implementation and benchmarks are pending, but there are strong indicators that it will bring relief for the memory overhead of most applications without leading to a major degradation performance-wise. Not for Python code anyway, and I'll try to make sure Cython extensions won't notice much when switching to CPython 3.3. Martin, this is a smart way of doing it. Stefan
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4