M.-A. Lemburg wrote: > On 2008-06-13 11:32, Walter Dörwald wrote: >> M.-A. Lemburg wrote: >>> On 2008-06-12 16:59, Walter Dörwald wrote: >>>> M.-A. Lemburg wrote: >>>>> .transform() and .untransform() use the codecs to apply same-type >>>>> conversions. They do apply type checks to make sure that the >>>>> codec does indeed return the same type. >>>>> >>>>> E.g. text.transform('xml-escape') or data.transform('base64'). >>>> >>>> So what would a base64 codec do with the errors argument? >>> >>> It could use it to e.g. try to recover as much data as possible >>> from broken input data. >>> >>> Currently (in Py2.x), it raises an exception if you pass in anything >>> but "strict". >>> >>>>>> I think for transformations we don't need the full codec machinery: >>>>> > ... >>>>> >>>>> No need to invent another wheel :-) The codecs already exist for >>>>> Py2.x and can be used by the .encode()/.decode() methods in Py2.x >>>>> (where no type checks occur). >>>> >>>> By using a new API we could get rid of old warts. For example: Why >>>> does the stateless encoder/decoder return how many input >>>> characters/bytes it has consumed? It must consume *all* bytes anyway! >>> >>> No, it doesn't and that's the point in having those return values :-) >>> >>> Even though the encoder/decoders are stateless, that doesn't mean >>> they have to consume all input data. The caller is responsible to >>> make sure that all input data was in fact consumed. >>> >>> You could for example have a decoder that stops decoding after >>> having seen a block end indicator, e.g. a base64 line end or >>> XML closing element. >> >> So how should the UTF-8 decoder know that it has to stop at a closing >> XML element? > > The UTF-8 decoder doesn't support this, but you could write a codec > that applies this kind of detection, e.g. to not try to decode > partial UTF-8 byte sequences at the end of input, which would then > result in error. > >>> Just because all codecs that ship with Python always try to decode >>> the complete input doesn't mean that the feature isn't being used. >> >> I know of no other code that does. Do you have an example for this use. > > I already gave you a few examples. Maybe I was unclear, I meant real world examples, not hypothetical ones. >>> The interface was designed to allow for the above situations. >> >> Then could we at least have a new codec method that does: >> >> def statelesencode(self, input): >> (output, consumed) = self.encode(input) >> assert len(input) == consumed >> return output > > You mean as method to the Codec class ? No, I meant as a method for the CodecInfo clas. > Sure, we could do that, but please use a different name, > e.g. .encodeall() and .decodeall() - .encode() and .decode() > are already stateles (and so would the new methods be), so > "stateless" isn't all that meaningful in this context. I like the names encodeall/decodeall! > We could also add such a check to the PyCodec_Encode() and _Decode() > functions. They currently do not apply the above check. > > In Python, those two functions are exposed as codecs.encode() > and codecs.decode(). This change will probably have to wait for the 2.7 cycle. Servus, Walter
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4