Le 28/08/2011 23:06, "Martin v. Löwis" a écrit : > Am 28.08.2011 22:01, schrieb Antoine Pitrou: >> >>> - the iobench results are between 2% acceleration (seek operations), >>> 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and >>> 37% for large sized reads (154 MB/s vs. 235 MB/s). The speed >>> difference is probably in the UTF-8 decoder; I have already >>> restored the "runs of ASCII" optimization and am out of ideas for >>> further speedups. Again, having to scan the UTF-8 string twice >>> is probably one cause of slowdown. >> >> I don't think it's the UTF-8 decoder because I see an even larger >> slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le"). > > Those haven't been ported to the new API, yet. Consider, for example, > d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; > with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this > is a 25% speedup for PEP 393. If I understand correctly, the performance now highly depend on the used characters? A pure ASCII string is faster than a string with characters in the ISO-8859-1 charset? Is it also true for BMP characters vs non-BMP characters? Do these benchmark tools use only ASCII characters, or also some ISO-8859-1 characters? Or, better, different Unicode ranges in different tests? Victor
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4