Paul Moore <p.f.moore <at> gmail.com> writes: > > > > As I pointed out, utf-8, utf-16 and latin1 decoders have already been optimized > > in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). > > The dataset for iobench isn't pure ASCII though, and that's why it's not as fast. > > Ah, thanks. Although you said your data was 95% ASCII, and you're > getting decode speeds of 250MB/s. That's 75% slowdown for 5% of the > data! Surely that's not right??? If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's quite obvious why it is so :-) There is a (very) fast path for chunks of pure ASCII data, and (fast but not blazingly fast) fallback for non ASCII data. Please don't think of it as a slowdown... It's still much faster than 2.x, which manages 130MB/s on the same data. Regards Antoine.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4