RetroSearch Browse

Wed Apr 24 11:18:06 CEST 2013 · https://mail.python.org/pipermail/python-dev/2013-April/125482.html

On 4/24/2013 1:22 AM, M.-A. Lemburg wrote:
> On 23.04.2013 19:24, Guido van Rossum wrote:
>> On Tue, Apr 23, 2013 at 9:04 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 23.04.2013 17:47, Guido van Rossum wrote:
>>>> On Tue, Apr 23, 2013 at 8:22 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>> Just as reminder: we have the general purpose
>>>>> encode()/decode() functions in the codecs module:
>>>>>
>>>>> import codecs
>>>>> r13 = codecs.encode('hello world', 'rot-13')
>>>>>
>>>>> These interface directly to the codec interfaces, without
>>>>> enforcing type restrictions. The codec defines the supported
>>>>> input and output types.
>>>> As an implementation mechanism I see nothing wrong with this. I hope
>>>> the codecs module lets you introspect the input and output types of a
>>>> codec given by name?
>>> At the moment there is no standard interface to access supported
>>> input and output types... but then: regular Python functions or
>>> methods also don't provide such functionality, so no surprise
>>> there ;-)
>> Not quite the same though. Each function has its own unique behavior.
>> But codecs support a standard interface, *except* that the input and
>> output types sometimes vary.
> The codec system itself
>
>>> It's mostly a matter of specifying the supported type
>>> combinations in the codec documentation.
>>>
>>> BTW: What would be a use case where you'd want to
>>> programmatically access such information before calling
>>> the codec ?
>> As you know, in Python 3, most code working with bytes doesn't also
>> work with strings, and vice versa (except for a few cases where we've
>> gone out of our way to write polymorphic code -- but users rarely do
>> so, and any time you use a string or bytes literal you basically limit
>> yourself to that type).
>>
>> Suppose I write a command-line utility that reads a file, runs it
>> through a codec, and writes the result to another file. Suppose the
>> name of the codec is a command-line argument (as well as the
>> filenames). I need to know whether to open the files in text or binary
>> mode based on the name of the codec.
> Ok, so you need to know which codecs your tool can support and
> which of those need text input and which bytes input.
>
> I've been thinking about this some more: I think that type
> information alone is not flexible enough to cover such
> use cases.

Maybe MIME type and encoding would be sufficient type information, but 
probably not str vs. bytes.

> In your use case you'd want to only permit use of a certain
> set of codecs, not simply all of them, since some might
> not implement what you actually want to achieve with the tool,
> e.g. a user might have installed a codec set that adds
> support for reading and writing image data, but your
> intended use was to only support text data.

MIME type supports this sort of concept, with the two-level hierarchy of 
naming the type... text/xml text/plain image/jpeg

> So what we need is a way to allow the codecs to say e.g.
> "I work on text", "I support encoding bytes and text",
> "I encode to bytes", "I'm reversible", "I transform
> input data", "I support bytes and text, and will create
> same type output", "I work on image data", "I work on
> X509 certificates", "I work on XML data", etc.

Guess what I think you are re-inventing here....
Nope, guess again....
Yep, MIME types _plus_ encodings.

> In other words, we need a form of tagging system, with a
> set of standard tags that each codec can publish and
> which also allows non-standard tags (which can then at
> some point be made standard, if there's agreement on them).

Hmm.  Sounds just like the registry for, um, you guessed it: MIME types.

> Given a codec name you could then ask the codec registry for
> the codec tags and verify that the chosen codec handles
> text data, needs bytes or text encoding input and
> creates bytes as encoding output. If the registry returns
> codec tags that don't include the "I work on text" tag,
> the tool could then raise an error.

For just doing text encoding transformations,  text/plain would work as 
a MIME type, and the encodings of interest for the encodings.

Seems like "str" always means "Unicode" but the MIME type can vary; 
"bytes" might mean encoded text, and the MIME type can also vary.

For non-textual transformations, "encoding" might mean Base 64, BinHex, 
or other such representations... but those can also be applied to text, 
so it might be a 3rd dimension, or it might just be a list of encodings 
rather than a single encoding.

Compression could be another dimension, or perhaps another encoding.

But really, then, a transformation needs to be a list of steps; a codec 
can sign up to perform one or more of the steps, a sequence of codecs 
would have to be found, capable of performing a subsequence of the 
steps, and then run in the appropriate order.

This all sounds so general, that probably the Python compiler could be 
implemented as a codec :)  Or any compiler. Probably a web server could 
be implemented as a codec too :)  Well, maybe not, codecs have limited 
error handling and reporting abilities.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130424/afeb3f6c/attachment.html>

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2013-April/125482.html below:

[Python-Dev] Why can't I encode/decode base64 without importing a module?