Walter Dörwald wrote: > Am 04.10.2005 um 04:25 schrieb jepler at unpythonic.net: > > >>As the OP suggests, decoding with a codec like mac-roman or >>iso8859-1 is very >>slow compared to encoding or decoding with utf-8. Here I'm working >>with 53k of >>data instead of 53 megs. (Note: this is a laptop, so it's possible >>that >>thermal or battery management features affected these numbers a >>bit, but by a >>factor of 3 at most) >> >>$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "u.encode('utf-8')" >>1000 loops, best of 3: 591 usec per loop >>$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')" >>1000 loops, best of 3: 1.25 msec per loop >>$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')" >>100 loops, best of 3: 13.5 msec per loop >>$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('iso8859-1')" >>100 loops, best of 3: 13.6 msec per loop >> >>With utf-8 encoding as the baseline, we have >> decode('utf-8') 2.1x as long >> decode('mac-roman') 22.8x as long >> decode('iso8859-1') 23.0x as long >> >>Perhaps this is an area that is ripe for optimization. > > > For charmap decoding we might be able to use an array (e.g. a tuple > (or an array.array?) of codepoints instead of dictionary. > > Or we could implement this array as a C array (i.e. gencodec.py would > generate C code). That would be a possibility, yes. Note that the charmap codec was meant as faster replacement for the old string transpose function. Dictionaries are used for the mapping to avoid having to store huge (largely empty) mapping tables - it's a memory-speed tradeoff. Of course, a C version could use the same approach as the unicodedatabase module: that of compressed lookup tables... http://aggregate.org/TechPub/lcpc2002.pdf genccodec.py anyone ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 04 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4