A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-May/024193.html below:

[Python-Dev] getting the UCS-2 representation of a unicode object

[Python-Dev] getting the UCS-2 representation of a unicode object [Python-Dev] getting the UCS-2 representation of a unicode objectJohn Machin sjmachin@lexicon.net
Mon, 20 May 2002 10:22:39 +1000
20/05/2002 12:35:19 AM, "Andreas Jung" <andreas@andreas-jung.com> wrote:

>Sounds reasonable..but since Py_ParseTuple() only applies to function
>arguments
>it can not be used to convert a unicode object to UCS-2. So what is the
>easiest
>way to get the UCS-2 representation? PyUnicode_AS_DATA() returns for
>u'computer'
>a char * with strlen()==1, however PyUnicode_GET_DATA_SIZE() on the
>same string returns 16 (looks fine for the two byes encoding of UCS-2). Am I
>missing
>something?
>

Andreas,

If you don't care about surrogates or weird things like the Hong Kong extended character set that are outside the 2**16 range, pretend UCS-2 == UTF-16. Then on a narrow Python build, the 
unicode object is in effect in UCS-2; no conversion required.

You are indeed missing something about PyUnicode_AS_DATA -- the doc says it returns a char * pointer to the internal buffer. I can't imagine what relevance strlen(such_a_pointer) has. The 
buffer will contain "c\0o\0m\0 etc etc" when viewed as a series of bytes (on a little-endian box) so yes strlen -> 1 but so what?

What is there about the PyUnicode_AS_UNICODE() function that you don't like?

Perhaps you might like to (a) say what you are trying to achieve (b) move the discussion to c.l.py

Regards,

John






RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4