Showing content from http://mail.python.org/pipermail/python-dev/attachments/20140113/f5fa256d/attachment.html below:
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#330033">
<div class="moz-cite-prefix">On 1/13/2014 9:25 PM, Nick Coghlan
wrote:<br>
</div>
<blockquote
cite="mid:CADiSq7e=kvhm5GCX-EJniezkuPw=dPBVG8jG3rSsGm2uK5TOLQ@mail.gmail.com"
type="cite">
<pre wrap="">since this observation makes it clear that there's <b class="moz-txt-star"><span class="moz-txt-tag">*</span>no<span class="moz-txt-tag">*</span></b> coherent way
to offer a pure binary interpolation API - the only general purpose
combination mechanism for segments of binary data that can avoid
making assumptions about the encodings of metacharacters is simple
concatenation.</pre>
</blockquote>
That's almost true, and I'm glad that you, Guido, and all of us can
understand that the currently defined python2 and python3 formatting
syntaxes contain an inherent ASCII assumption, just like many
internet protocols. The bitter fight is over :)<br>
<br>
However, your statement above isn't 100% accurate, so just for the
pedantry of it, I'll point out why. A mechanism could be defined
where "format string" would only contain format specifications, and
any other text would be considered an error. The format string could
have an explicit or a defined encoding, there would be no need to
make an assumption about its encoding. And since it would not
contain text except for format specifications, it would only be used
as a rule-book on how to interpret the parameters, contributing no
text of its own to the result.<br>
<br>
This wouldn't solve the problem at hand, though, which is to provide
a nice migration path from Python 2 to Python 3 for code that uses
ASCII-based format strings that do contribute text as well as
include parameter data.<br>
<br>
Whether such a technique would be more useful than simple
concatenation (or complex concatenation such as join) remains to be
seen, and possibly discussed, if anyone is interested, but it
probably would belong on python-ideas, since it would not address an
immediate porting issue.<br>
<br>
Assuming an ASCII-in-bytes format string (but with no contributed
text to the result) one could write something like<br>
<br>
b"%{koi7}s%{00}v%{big5}d%{00}v%{ShiftJIS}s%{0000}v%b" / ( cyrillic,
len( blob ), japanese, blob )<br>
<br>
So the encodings to be applied to each of the input parameters could
be explicitly specified.<br>
<br>
The %{00}v stuff would be interpolated into the output... expressed
in ASCII as hex, two characters per byte. Note that the number uses
Chinese digits in the big5 encoding, but I don't know if the Chinese
even use their own digits or ASCII ones these days, or what base
they use, I guess it was the Babylonians that used base 60 from
which our timekeeping and angular measures were derived. The example
shows a null byte or two between items in the output.<br>
<br>
So there _could be_ a coherent way to offer an interpolation
mechanism that is pure binary, and allows selection of encoding of
str data, if and as needed. One specifier could even be an encoding
to apply to any format specifiers that don't include an encoding, so
in the typical case of dealing with a single language output, the
appropriate encoding could be set at the beginning of the format
specification and overridden by particular specifiers if need be.
But while there _could be_ such an interpolation mechanism, it isn't
compatible with Python 2, and the jury hasn't decided whether such a
thing is sufficiently more useful than concatenation to be worth
implementing. A different operator might be required, or the whole
thing could be a function instead of an operator, with a similar
format specification, or one more like the minilanguage used with
format in python 3.<br>
</body>
</html>
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4