Le 24/08/2011 11:22, Glenn Linderman a écrit : >>> c) mostly ASCII (utf8) with clever indexing/caching to be efficient >>> d) UTF-8 with clever indexing/caching to be efficient >> I see neither a need nor a means to consider these. > > The discussion about "mostly ASCII" strings seems convincing that there > could be a significant space savings if such were implemented. Antoine's optimization in the UTF-8 decoder has been removed. It doesn't change the memory footprint, it is just slower to create the Unicode object. When you decode an UTF-8 string: - "abc" string uses "latin1" (8 bits) units - "aé" string uses "latin1" (8 bits) units <= cool! - "a€" string uses UCS2 (16 bits) units - "a\U0010FFFF" string uses UCS4 (32 bits) units Victor
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4