[Scott Gilbert] > The following is really ugly (until you put it in a macro), but > would work: > > #define HALF_BITS (sizeof(size_t) * 4U) > #define LO_MASK ((((size_t)1) << HALF_BITS)-1) > #define HI_MASK (LO_MASK<<HALF_BITS) > > if ( > /* Tim's fast path */ > ((src1 | src2) & HI_MASK) && ( > /* Ugly slower path */ > (((src1&LO_MASK) * (src2>>HALF_BITS)) & HI_MASK) | > (((src1>>HALF_BITS) * (src2&LO_MASK)) & HI_MASK) | > (src1>>HALF_BITS) * (src2>>HALF_BITS)) > ) { > goto overflow; > } I'll repeat that it's hard to do this correctly and quickly (c.f. earlier comments about the history of Python's subtly buggy attempts). This kind of approach is viewing src1 and src2 as 2-digit numbers in base X = 2**HALF_BITS: src1 = A*X + B src2 = C*X + D then viewing their product as highpart: A*C*X**2 + xxxxxxxx middlepart1: A*D*X + xxxxxxxx middlepart2: B*C*X + xxxxxxxx lowpart: B*D xxxxxxxx ---------------- xxxxxxxxxxxxxxxx <danger>< safe > Your second line checks whether the high half of middlepart2 is 0; your third line whether the high half of middlepart1 is 0; your last line whether highpart is 0. But they can all be 0 and *still* have overflow: the sum low half of middlepart1 + low half of middlepart2 + high half of lowpart can carry over into the <danger> zone. Supposing we were looking at 16-bit products, here's an example: 0x01ff * 0x00ff ------ 0000 00ff 0000 fe01 -------- 0001fd01 ^ oops Hint: before posting something like this, prototype it in Python, and do an exhaustive check of all possible pairs of multiplicands in an artificially small base. Very few buggy algorithms survive that, even for very small artificial bases.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4