On Tue, Aug 13, 2002 at 09:07:49AM +0200, Martin v. Loewis wrote: > Oren Tirosh <oren-py-d@hishome.net> writes: > > I think that this will produce the smallest number of > > incompatibilities for existing code and maintain compatibility with > > C header files on 32 bit platforms. In this case 0xff000000 will > > always be interpreted as -16777216 and the 'i' parser will happily > > convert it to wither 0xFF000000 or 0xFFFFFFFFFF000000, depending on > > the native platform word size - which is probably what the > > programmer meant. > > This means you suggest that PEP 237 is not implemented, or atleast > frozen at the current stage. Not at all! Removing the differences between ints and longs is good. My reservations are about thehexadecimal representation. - Currently, the '%u', '%x', '%X' and '%o' string formatting operators and the hex() and oct() built-in functions behave differently for negative numbers: negative short ints are formatted as unsigned C long, while negative long ints are formatted with a minus sign. This will be changed to use the long int semantics in all cases (but without the trailing 'L' that currently distinguishes the output of hex() and oct() for long ints). Note that this means that '%u' becomes an alias for '%d'. It will eventually be removed. In Python up to 2.2 it's inconsistent between ints and longs: >>> hex(-16711681) '0xff00ffff' >>> hex(-16711681L) '-0xff0001L' # ??!?!? The hex representation of ints gives me useful information about their bit structure. After all, it is not immediately apparent to most mortals that the number above is a mask for bits 16-23. The hex representation of longs is something I find quite misleading and I think it's also unprecedented. This wart has bothered me for a long time now but I didn't have any use for it so I didn't mind too much. Now it is proposed to extend this useless representation to ints so I do. So we have two elements of the language that are inconsistent. One of them is in widespread use and the other is... ahem... Which one of them should be changed to conform to the other? My proposal: On 32 bit platforms: >>> hex(-16711681) '0xff00ffff' >>> hex(-16711681L) '0xff00ffff' On 64 bit platforms: >>> hex(-16711681) '0xffffffffff00ffffLL' >>> hex(-16711681L) '0xffffffffff00ffffLL' The 'LL' suffix means that this number is to be treated as a 64 bit *signed* number. This is consistent with the way it is interpreted by GCC and other unix compilers on both 32 and 64 bit platforms. What to do about numbers from 2**31 to 2**32-1? >>> hex(4278255615) 0xff00ffffU The U suffix, also borrowed from C, makes it unambigous on 32 and 64 bit platforms for both Python and C. Representation of positive numbers: 0x00000000 - 0x7fffffff : unambigous on all platforms 0x80000000U - 0xffffffffU : representation adds U suffix 0x100000000LL - 0x7fffffffffffffffLL : representation adds LL suffix Representation of negative numbers: 0x80000000 - 0xffffffff (-2147483648 to -1): 8 digits on 32 bit platforms 0xffffffff80000000LL - 0xffffffffffffffffLL (same range): 16 digits and LL suffix on 64 bit platforms others negative numbers: 16 digits and LL suffix on all platforms. This makes the hex representation of a number informative and consistent between int and long on all platforms. It is also consistent with the C compiler on the same platform. Yes, it will produce a different text representation of some numbers on different platforms but this conveys important information about the bit structure of the number which really is different between platforms. eval()ing it back to a number is still consistent. When converting in the other direction (hex representation to number) there is an ambigous range from 0x80000000 to 0xffffffff. Should it be treated as signed or unsigned? The current interpretation is signed. PEP 237 proposes to change it to unsigned. I propose to do neither - this range should be deprecated and some explicit notation should be used instead. There's no need to be in a hurry about deprecating it, though. The overwhelming majority of Python code will run on 32 bit platforms for some time yet. I propose that on 32 bit platforms this will produce a silent warning. No code will break. Running the program with -Wall will inform the programmer that the code may not work for some future version of Python. On 64 bit platforms this will be interpreted the same way as on a 32 bit platform (signed 32 bits) but produce a noisy warning. If the code was written on a 64 bit platform and the programmer meant the number to be treated as unsigned an explicit U suffix can be added to make it unambigously unsigned. If the code was written on a 32 bit platform and the programmer meant the number to be treated as signed it's possible to just live with the warning (the code should still run correctly) or add 8 leading 'F's and an 'LL' suffix to make it unambigously signed. The modified code will run without warning on both 32 and 64 bit platforms. Notes: The number 4000000000 would be represented in hex as 0xEE6B2800U whether it's as an int on a 64 bit platform or a long on either 32 or 64 bit platforms. The representation depends only on the numeric value, not the type. This proposal therefore does not contradict the purpose of PEP 237 because ints and longs are treated identically. What's the hex representation of numbers outside the range of 64 bit integers? Frankly, I don't care. I'll go with any proposed solution as long as eval(hex(x)) == x. On Microsoft platforms 64 bit literals use the suffix 'i64', not 'LL'. Python may either use 'LL' exclusively or produce 'i64' on Microsoft platforms and 'LL' on other platforms. In the latter case it should accept either suffix on all platforms. Yes, this proposal is more complicated and has special treatment for different ranges but that is because the issue is not trivial and cannot be brushed aside using a one-size-doesn't-fit-anyone approach. This reminds me a lot of unicode issues. What about the L suffix? This proposal adopts the LL and U suffixes from C and ensures that they are interpreted consistently on both languages. But the L suffix is not consistent with C for the range 0x80000000L to 0xFFFFFFFFL. Should the L suffix be deprecated? Should it produce a warning for the possibly ambigous range? Oren
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4