On 4/24/2013 1:22 AM, M.-A. Lemburg wrote: > On 23.04.2013 19:24, Guido van Rossum wrote: >> On Tue, Apr 23, 2013 at 9:04 AM, M.-A. Lemburg <mal at egenix.com> wrote: >>> On 23.04.2013 17:47, Guido van Rossum wrote: >>>> On Tue, Apr 23, 2013 at 8:22 AM, M.-A. Lemburg <mal at egenix.com> wrote: >>>>> Just as reminder: we have the general purpose >>>>> encode()/decode() functions in the codecs module: >>>>> >>>>> import codecs >>>>> r13 = codecs.encode('hello world', 'rot-13') >>>>> >>>>> These interface directly to the codec interfaces, without >>>>> enforcing type restrictions. The codec defines the supported >>>>> input and output types. >>>> As an implementation mechanism I see nothing wrong with this. I hope >>>> the codecs module lets you introspect the input and output types of a >>>> codec given by name? >>> At the moment there is no standard interface to access supported >>> input and output types... but then: regular Python functions or >>> methods also don't provide such functionality, so no surprise >>> there ;-) >> Not quite the same though. Each function has its own unique behavior. >> But codecs support a standard interface, *except* that the input and >> output types sometimes vary. > The codec system itself > >>> It's mostly a matter of specifying the supported type >>> combinations in the codec documentation. >>> >>> BTW: What would be a use case where you'd want to >>> programmatically access such information before calling >>> the codec ? >> As you know, in Python 3, most code working with bytes doesn't also >> work with strings, and vice versa (except for a few cases where we've >> gone out of our way to write polymorphic code -- but users rarely do >> so, and any time you use a string or bytes literal you basically limit >> yourself to that type). >> >> Suppose I write a command-line utility that reads a file, runs it >> through a codec, and writes the result to another file. Suppose the >> name of the codec is a command-line argument (as well as the >> filenames). I need to know whether to open the files in text or binary >> mode based on the name of the codec. > Ok, so you need to know which codecs your tool can support and > which of those need text input and which bytes input. > > I've been thinking about this some more: I think that type > information alone is not flexible enough to cover such > use cases. Maybe MIME type and encoding would be sufficient type information, but probably not str vs. bytes. > In your use case you'd want to only permit use of a certain > set of codecs, not simply all of them, since some might > not implement what you actually want to achieve with the tool, > e.g. a user might have installed a codec set that adds > support for reading and writing image data, but your > intended use was to only support text data. MIME type supports this sort of concept, with the two-level hierarchy of naming the type... text/xml text/plain image/jpeg > So what we need is a way to allow the codecs to say e.g. > "I work on text", "I support encoding bytes and text", > "I encode to bytes", "I'm reversible", "I transform > input data", "I support bytes and text, and will create > same type output", "I work on image data", "I work on > X509 certificates", "I work on XML data", etc. Guess what I think you are re-inventing here.... Nope, guess again.... Yep, MIME types _plus_ encodings. > In other words, we need a form of tagging system, with a > set of standard tags that each codec can publish and > which also allows non-standard tags (which can then at > some point be made standard, if there's agreement on them). Hmm. Sounds just like the registry for, um, you guessed it: MIME types. > Given a codec name you could then ask the codec registry for > the codec tags and verify that the chosen codec handles > text data, needs bytes or text encoding input and > creates bytes as encoding output. If the registry returns > codec tags that don't include the "I work on text" tag, > the tool could then raise an error. For just doing text encoding transformations, text/plain would work as a MIME type, and the encodings of interest for the encodings. Seems like "str" always means "Unicode" but the MIME type can vary; "bytes" might mean encoded text, and the MIME type can also vary. For non-textual transformations, "encoding" might mean Base 64, BinHex, or other such representations... but those can also be applied to text, so it might be a 3rd dimension, or it might just be a list of encodings rather than a single encoding. Compression could be another dimension, or perhaps another encoding. But really, then, a transformation needs to be a list of steps; a codec can sign up to perform one or more of the steps, a sequence of codecs would have to be found, capable of performing a subsequence of the steps, and then run in the appropriate order. This all sounds so general, that probably the Python compiler could be implemented as a codec :) Or any compiler. Probably a web server could be implemented as a codec too :) Well, maybe not, codecs have limited error handling and reporting abilities. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20130424/afeb3f6c/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4