RetroSearch Browse

Tue Oct 25 00:59:27 CEST 2005 · https://mail.python.org/pipermail/python-dev/2005-October/057604.html

Antoine Pitrou wrote:
>>There are many design alternatives:
> 
> Wouldn't it be simpler to use:
> - one-byte representation if every character <= 0xFF
> - two-byte representation if every character <= 0xFFFF
> - four-byte representation otherwise

As I said: there are many alternatives. This one has the
disadvantage of requiring a copy every time you pass the string
to a Win32 function (which expects UTF-16).

Whether or not this is a significant disadvantage, I don't know.

In any case, a multi-representations implementation has the
disadvantage of making the C API more difficult to use, in
particular for writing codecs. On encoding, it is difficult
to fetch the individual characters which you need for the
lookup table; on decoding, it is difficult to know in advance
what representation to use (unless you know there is an upper
bound on the decoded character ordinals).

Regards,
Martin

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2005-October/057604.html below:

[Python-Dev] Divorcing str and unicode (no more implicit conversions).