Jean-Claude Wippler wrote: > > Greg Stein wrote: > [MAL:] > > > The downside of using UTF16: it is a variable length format, > > > so iterations over it will be slower than for UCS4. > > > > Bzzt. May as well go with UTF-8 as the internal format, much like Perl > > is doing (as I recall). > > Ehm, pardon me for asking - what is the brief rationale for selecting > UCS2/4, or whetever it ends up being, over UTF8? UCS-2 is the native format on major platforms (meaning straight fixed length encoding using 2 bytes), ie. interfacing between Python's Unicode object and the platform APIs will be simple and fast. UTF-8 is short for ASCII users, but imposes a performance hit for the CJK (Asian character sets) world, since UTF8 uses *variable* length encodings. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 51 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4