RetroSearch Browse

Tue Mar 6 19:15:16 CET 2012 · https://mail.python.org/pipermail/python-dev/2012-March/117378.html

Victor Stinner <victor.stinner at gmail.com> wrote:
> > 'c' -> UCS1
> > 'u' -> UCS2
> > 'w' -> UCS4
> 
> A Unicode string is an array of code point. Another approach is to
> expose such string as an array of uint8/uint16/uint32 integers. I
> don't know if you expect to get a character / a substring when you
> read the buffer of a string object. Using Python 3.2, I get:
> 
> >>> memoryview(b"abc")[0]
> b'a'
> 
> ... but using Python 3.3 I get a number :-)

Yes, that's changed because officially (see struct module) the format
is unsigned bytes, which are integers in struct module syntax:

>>> unsigned_bytes = memoryview(b"abc")
>>> unsigned_bytes.format
'B'
>>> char_array = unsigned_bytes.cast('c')
>>> char_array.format
'c'
>>> char_array[0]
b'a'

Possibly the uint8/uint16/uint32 integer approach that you mention
would make more sense.

Stefan Krah

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2012-March/117378.html below:

[Python-Dev] PEP-393/PEP-3118: unicode format specifiers