I was just confused that a part of documentation talks about UTF-16 vs. UCS-2 since Python uses UCS-2(4) as internal representation. I also did not know that UCS-2 is a subset of UTF-16...I think my problems are now solved...at least from the Python side. Andreas ----- Original Message ----- From: "John Machin" <sjmachin@lexicon.net> To: "Andreas Jung" <andreas@andreas-jung.com> Cc: <python-dev@python.org> Sent: Sunday, May 19, 2002 20:22 Subject: Re: [Python-Dev] getting the UCS-2 representation of a unicode object > 20/05/2002 12:35:19 AM, "Andreas Jung" <andreas@andreas-jung.com> wrote: > > >Sounds reasonable..but since Py_ParseTuple() only applies to function > >arguments > >it can not be used to convert a unicode object to UCS-2. So what is the > >easiest > >way to get the UCS-2 representation? PyUnicode_AS_DATA() returns for > >u'computer' > >a char * with strlen()==1, however PyUnicode_GET_DATA_SIZE() on the > >same string returns 16 (looks fine for the two byes encoding of UCS-2). Am I > >missing > >something? > > > > Andreas, > > If you don't care about surrogates or weird things like the Hong Kong extended character set that are outside the 2**16 range, pretend UCS-2 == UTF-16. Then on a narrow Python build, the > unicode object is in effect in UCS-2; no conversion required. > > You are indeed missing something about PyUnicode_AS_DATA -- the doc says it returns a char * pointer to the internal buffer. I can't imagine what relevance strlen(such_a_pointer) has. The > buffer will contain "c\0o\0m\0 etc etc" when viewed as a series of bytes (on a little-endian box) so yes strlen -> 1 but so what? > > What is there about the PyUnicode_AS_UNICODE() function that you don't like? > > Perhaps you might like to (a) say what you are trying to achieve (b) move the discussion to c.l.py > > Regards, > > John > >
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4