> Anyone can test. > > $ ./python -m timeit -s 'enc = "latin1"; import codecs; d = > codecs.getdecoder(enc); x = ("\u0020" * 100000).encode(enc)' 'd(x)' > 10000 loops, best of 3: 59.4 usec per loop > $ ./python -m timeit -s 'enc = "latin1"; import codecs; d = > codecs.getdecoder(enc); x = ("\u0080" * 100000).encode(enc)' 'd(x)' > 10000 loops, best of 3: 28.4 usec per loop > > The results are fairly stable (±0.1 µsec) from run to run. It looks > funny thing. This is not surprising. When decoding Latin-1, it needs to determine whether the string is pure ASCII or not. If it is not, it must be all Latin-1 (it can't be non-Latin-1). For a pure ASCII string, it needs to scan over the entire string, trying to find a non-ASCII character. If there is none, it has to inspect the entire string. In your example, as the first character is is already above 127, search for the maximum character can stop, so it needs to scan the string only once. Try '\u0020' * 999999+'\u0080', which is a non-ASCII string but still takes the same time as the pure ASCII string. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4