On Fri, 12 Nov 1999, Fred L. Drake, Jr. wrote: > M.-A. Lemburg writes: > > The abbreviation BOM is quite common w/r to Unicode. True. > Yes: "w/r to Unicode". In sys, it's out of context and should > receive a more descriptive name. I think using BOM in unicodec is > good. I agree and believe that we can avoid putting it into sys altogether. > > BOM_BE: '\376\377' > > (corresponds to Unicode 0x0000FEFF in UTF-16 > > == ZERO WIDTH NO-BREAK SPACE) Are you sure about that interpretation? I thought the BOM characters (0xFEFF and 0xFFFE) were *reserved* in the UCS-2 space. > I'd also add BOM to be the same as sys.byte_order_mark. Perhaps > even instead of sys.byte_order_mark (just to localize the areas of > code that are affected). ### unicodec.py ### import struct BOM = struct.pack('h', 0x0000FEFF) BOM_BE = '\376\377' ... If somebody needs the BOM, then they should go to unicodec.py (or some other module). I do not believe we need to put that stuff into the sys module. It is just too easy to create the value in Python. Cheers, -g p.s. to be pedantic, the pack() format could be '@h' -- Greg Stein, http://www.lyra.org/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4