RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/1999-November/001230.html below:

[Python-Dev] Internationalization Toolkit

[Python-Dev] Internationalization ToolkitGreg Stein gstein@lyra.org
Fri, 12 Nov 1999 15:26:08 -0800 (PST)

Previous message: [Python-Dev] Internationalization Toolkit
Next message: [Python-Dev] Internationalization Toolkit
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 12 Nov 1999, Fred L. Drake, Jr. wrote:
> M.-A. Lemburg writes:
>  > The abbreviation BOM is quite common w/r to Unicode.

True.

>   Yes: "w/r to Unicode".  In sys, it's out of context and should
> receive a more descriptive name.  I think using BOM in unicodec is
> good.

I agree and believe that we can avoid putting it into sys altogether.

>  >   BOM_BE: '\376\377' 
>  >     (corresponds to Unicode 0x0000FEFF in UTF-16 
>  >      == ZERO WIDTH NO-BREAK SPACE)

Are you sure about that interpretation? I thought the BOM characters
(0xFEFF and 0xFFFE) were *reserved* in the UCS-2 space.

>   I'd also add BOM to be the same as sys.byte_order_mark.  Perhaps
> even instead of sys.byte_order_mark (just to localize the areas of
> code that are affected).

### unicodec.py ###
import struct

BOM = struct.pack('h', 0x0000FEFF)
BOM_BE = '\376\377'
...

If somebody needs the BOM, then they should go to unicodec.py (or some
other module). I do not believe we need to put that stuff into the sys
module. It is just too easy to create the value in Python.

Cheers,
-g

p.s. to be pedantic, the pack() format could be '@h'

--
Greg Stein, http://www.lyra.org/

Previous message: [Python-Dev] Internationalization Toolkit
Next message: [Python-Dev] Internationalization Toolkit
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4