RetroSearch Browse

Wed Nov 24 11:27:30 CET 2010 · https://mail.python.org/pipermail/python-dev/2010-November/105965.html

On Wed, 24 Nov 2010 18:51:49 +0900
"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> James Y Knight writes:
> 
>  > But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly
>  > superior [...]a because it is an ASCII superset, and thus more
>  > easily compatible with other software. That also makes it most
>  > commonly used for internet communication.
> 
> Sure, UTF-8 is very nice as a protocol for communicating text.  So
> what?  If your application involves shoveling octets real fast, don't
> convert and shovel those octets.  If your application involves
> significant text processing, well, conversion can almost always be
> done as fast as you can do I/O so it doesn't cost wallclock time, and
> generally doesn't require a huge percentage of CPU time compared to
> the actual text processing.  It's just a specialization of
> serialization, that we do all the time for more complex data
> structures.
> 
> So wire protocols are not a killer argument for or against any
> particular internal representation of text.

Agreed. Decoding and encoding utf-8 is so fast that it should be
dwarfed by any actual processing done on the text.

Regards

Antoine.

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2010-November/105965.html below:

[Python-Dev] len(chr(i)) = 2?