[Gerald S. Williams] > I didn't find any way to improve the actual overflow check, > although if you entirely replace the "fast path" check with > checks involving unsigned masking, you get some performance > improvement. For a wide variety of input patterns, I get > about an 18% speedup versus the core long multiply code, > when modified as shown below: Which platform? Which compiler? What was your test driver? Was this timing the mult code in isolation, or timing Python-level multiplies? Claims of small speedups are notoriously platform- and test-dependent. If it's a mixed bag across platforms, the risk of introducing a new bug would favor leaving things alone. In the absence of a clear correctness proof, a Python simulation program demonstrating correctness exhaustively in small bases would also be helpful. > ... > Shall I submit a patch? Sure, but also submit your timing harness so that people can measure the effects cross-platform and cross-compiler.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4