On 8/31/2017 2:40 PM, Manciu, Catalin Gabriel wrote: > Hi everyone, > > While looking over the PyLong source code in Objects/longobject.c I came > across the fact that the PyLong object doesnt't include implementation for > basic inplace operations such as adding or multiplication: > > [...] > long_long, /*nb_int*/ > 0, /*nb_reserved*/ > long_float, /*nb_float*/ > 0, /* nb_inplace_add */ > 0, /* nb_inplace_subtract */ > 0, /* nb_inplace_multiply */ > 0, /* nb_inplace_remainder */ > [...] > > While I understand that the immutable nature of this type of object justifies > this approach, I wanted to experiment and see how much performance an inplace > add would bring. > My inplace add will revert to calling the default long_add function when: > - the refcount of the first operand indicates that it's being shared > or > - that operand is one of the preallocated 'small ints' > which should mitigate the effects of not conforming to the PyLong immutability > specification. > It also allocates a new PyLong _only_ in case of a potential overflow. > > The workload I used to evaluate this is a simple script that does a lot of > inplace adding: > > import time > import sys > > def write_progress(prev_percentage, value, limit): > percentage = (100 * value) // limit > if percentage != prev_percentage: > sys.stdout.write("%d%%\r" % (percentage)) > sys.stdout.flush() > return percentage > > progress = -1 > the_value = 0 > the_increment = ((1 << 30) - 1) > crt_iter = 0 > total_iters = 10 ** 9 > > start = time.time() > > while crt_iter < total_iters: > the_value += the_increment > crt_iter += 1 > > progress = write_progress(progress, crt_iter, total_iters) > end = time.time() > > print ("\n%.3fs" % (end - start)) > print ("the_value: %d" % (the_value)) > > Running the baseline version outputs: > ./python inplace.py > 100% > 356.633s > the_value: 1073741823000000000 > > Running the modified version outputs: > ./python inplace.py > 100% > 308.606s > the_value: 1073741823000000000 > > In summary, I got a +13.47% improvement for the modified version. > The CPython revision I'm using is 7f066844a79ea201a28b9555baf4bceded90484f > from the master branch and I'm running on a I7 6700K CPU with Turbo-Boost > disabled (frequency is pinned at 4GHz). > > Do you think that such an optimization would be a good approach ? On my machine, the more realistic code, with an implicit C loop, the_value = sum(the_increment for i in range(total_iters)) gives the same value twice as fast as your explicit Python loop. (I cut total_iters down to 10**7). You might check whether sum uses an in-place accumulator for ints. -- Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4