I expect Martin checked in this change because of the unhappy hours he spent determining that the previous two versions of this function wrote beyond the memory they allocated. Since the most recent version still didn't bother to assert that it wasn't writing out of bounds, I can't blame Martin for checking in a version that does so assert; since I spent hours on this too, and this function has a repeated history of bad memory behavior, I viewed the version Martin replaced as unacceptable. However, the slowdown on large strings isn't attractive, and the previous version could easily enough have asserted its memory correctness. > -----Original Message----- > From: python-checkins-admin@python.org > [mailto:python-checkins-admin@python.org]On Behalf Of M.-A. Lemburg > Sent: Saturday, April 20, 2002 11:26 AM > To: loewis@sourceforge.net > Cc: python-checkins@python.org > Subject: Re: [Python-checkins] python/dist/src/Objects > unicodeobject.c,2.139,2.140 > > > loewis@sourceforge.net wrote: >> >> Update of /cvsroot/python/python/dist/src/Objects >> In directory usw-pr-cvs1:/tmp/cvs-serv30961 >> >> Modified Files: >> unicodeobject.c >> Log Message: >> Patch #495401: Count number of required bytes for encoding UTF-8 >> before allocating the target buffer. > > Martin, please back out this change again. We have discussed this > quite a few times and I am against using your strategy since > it introduces a performance hit which does not relate to the > gained advantage of (temporarily) using less memory. > > Your timings also show this, so I wonder why you checked in this > patch, e.g. from the patch log: > """ > For the current > CVS (unicodeobject.c 2.136: MAL's change to use a variable > overalloc), I get > > 10 spaces 20.060 > 100 spaces 2.600 > 200 spaces 2.030 > 1000 spaces 0.930 > 10000 spaces 0.690 > 10 spaces, 3 bytes 23.520 > 100 spaces, 3 bytes 3.730 > 200 spaces, 3 bytes 2.470 > 1000 spaces, 3 bytes 0.980 > 10000 spaces, 3 bytes 0.690 > 30 bytes 24.800 > 300 bytes 5.220 > 600 bytes 3.830 > 3000 bytes 2.480 > 30000 bytes 2.230 > > With unicode3.diff (that's the one you checked in), I get > > 10 spaces 19.940 > 100 spaces 3.260 > 200 spaces 2.340 > 1000 spaces 1.650 > 10000 spaces 1.450 > 10 spaces, 3 bytes 21.420 > 100 spaces, 3 bytes 3.410 > 200 spaces, 3 bytes 2.420 > 1000 spaces, 3 bytes 1.660 > 10000 spaces, 3 bytes 1.450 > 30 bytes 22.260 > 300 bytes 5.830 > 600 bytes 4.700 > 3000 bytes 3.740 > 30000 bytes 3.540 > """ > > The only case where your patch is faster is for very short > strings and then only by a few percent, whereas for all > longer strings you get worse timings, e.g. 3.74 seconds > compared to 2.48 seconds -- that's a 50% increase in > run-time ! > > Thanks, > -- > Marc-Andre Lemburg > CEO eGenix.com Software GmbH > ______________________________________________________________________ > Company & Consulting: http://www.egenix.com/ > Python Software: http://www.egenix.com/files/python/ > > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://mail.python.org/mailman/listinfo/python-checkins
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4