2009/1/28 Antoine Pitrou <solipsis at pitrou.net>: > When writing large chunks of text (4096, 1e6), bookkeeping costs become > marginal and encoding costs dominate. 2.x has no encoding costs, which > explains why it's so much faster. Interesting. However, it's still "slower" in terms of perception. In 2.x, I regularly do the equivalent of f = open("filename", "r") ... read strings from f ... Yes, I know this is byte I/O in reality, but for everything I do (Latin-1 on input and output, and for most practical purposes ASCII-only) it simply isn't relevant to me. If Python 3.x makes this substantially slower (working in a naive mode where I ignore encoding issues), claiming it's "encoding costs" doesn't make any difference - in a practical sense, I don't get any benefits and yet I pay the cost. (You can say my approach is wrong, but so what? I'll just say that 2.x is faster for me, and not migrate. Ultimately, this is about "marketing" 3.x...) It would be helpful to limit this cost as much as possible - maybe that's simply ensuring that the default encoding for open is (in the majority of cases) a highly-optimised one whose costs *don't* dominate in the way you describe (although if you're using UTF-8, I'd guess that would be the usual default on Linux, so it looks like there's some work needed there). Hmm, I just checked and on Windows, it appears that sys.getdefaultencoding() is UTF-8. That seems odd - I would have thought the majority of Windows systems were NOT set to use UTF-8 by default... Paul.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4