RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://mail.python.org/pipermail/python-dev/attachments/20140605/10aab2a6/attachment.html below:

<html>
 <head>
 <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
 </head>
 <body bgcolor="#FFFFFF" text="#330033">
 <div class="moz-cite-prefix">On 6/5/2014 3:10 AM, Paul Sokolovsky
 wrote: 
 </div>
 <blockquote cite="mid:20140605131039.4f5b74d6@x34f" type="cite">
 <pre wrap="">Hello,

On Wed, 04 Jun 2014 22:15:30 -0400

Terry Reedy <a class="moz-txt-link-rfc2396E" href="mailto:tjreedy@udel.edu"><tjreedy@udel.edu></a> wrote:

</pre>

<blockquote type="cite">
 <pre wrap="">think you are again batting at a strawman. If you mean 'read from a
file', and all you want to do is read bytes from and write bytes to
external 'files', then there is obviously no need to transcode and
neither Python 2 or 3 make you do so.
</pre>
 </blockquote>
 <pre wrap="">
But most files, network protocols are text-based, and I (and many other
people) don't want to artificially use "binary data" type for them,
with all attached funny things, like "b" prefix. And then Python2
indeed doesn't transcode anything, and Python3 does, without being
asked, and for no good purpose, because in most cases, Input data will
be Output as-is (maybe in byte-boundary-split chunks).

So, it all goes in rounds - ignoring the forced-Unicode problem (after a

week of subscription to python-list, half of traffic there appear to be
dedicated to Unicode-related flames) on python-dev behalf is not
going to help (Python community).
</pre>
 </blockquote>
 
 If all your program is doing is reading and writing data (input data
 will be output as-is), then use of binary doesn't require "b"
 prefix, because you aren't manipulating the data. Then you have no
 unnecessary transcoding. 
 
 If you actually wish to examine or manipulate the content as it
 flows by, then there are choices. 
 
 1) If you need to examine/manipulate only a small fraction of text
 data with the file, you can pay the small price of a few "b"
 prefixes to get high performance, and explicitly transcode only the
 portions that need to be manipulated. 
 
 2) If you are examining the bulk of the data as it flows by, but not
 manipulating it, just examining/extracting, then a full transcoding
 may be useful for that purpose... but you can perhaps do it
 explicitly, so that you keep the binary form for I/O. Careful of the
 block boundaries, in this case, however. 
 
 3) If you are actually manipulating the bulk of the data, then the
 double transcoding (once on input, and once on output) allows you to
 work in units of codepoints, rather than bytes, which generally
 makes the manipulation algorithms easier. 
 
 4) If you truly cannot afford the processor code of the double
 transcoding, and need to do all your manipulations at the byte
 level, then you could avoid the need for "b" prefix by use of a
 preprocessor for those sections of code that are doing all and only
 bytes processing... and you'll have lots of arcane, error-prone code
 to write to manipulate the bytes rather than the codepoints. 
 
 On the other hand, if you can convince your data sources and sinks
 to deal in UTF-8, and implement a UTF-8 str in Î¼Py, then you can
 both avoid transcoding, and make the arcane algorithms part of the
 implementation of Î¼Py rather than of the application code, and
 support full Unicode. And it seems to me that the world is moving
 that way... towards UTF-8 as the standard interchange format.
 Encourage it. 
 
 Glenn 
 </body>
</html>

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4