On 2008-06-13 11:32, Walter Dörwald wrote: > M.-A. Lemburg wrote: >> On 2008-06-12 16:59, Walter Dörwald wrote: >>> M.-A. Lemburg wrote: >>>> .transform() and .untransform() use the codecs to apply same-type >>>> conversions. They do apply type checks to make sure that the >>>> codec does indeed return the same type. >>>> >>>> E.g. text.transform('xml-escape') or data.transform('base64'). >>> >>> So what would a base64 codec do with the errors argument? >> >> It could use it to e.g. try to recover as much data as possible >> from broken input data. >> >> Currently (in Py2.x), it raises an exception if you pass in anything >> but "strict". >> >>>>> I think for transformations we don't need the full codec machinery: >>>> > ... >>>> >>>> No need to invent another wheel :-) The codecs already exist for >>>> Py2.x and can be used by the .encode()/.decode() methods in Py2.x >>>> (where no type checks occur). >>> >>> By using a new API we could get rid of old warts. For example: Why >>> does the stateless encoder/decoder return how many input >>> characters/bytes it has consumed? It must consume *all* bytes anyway! >> >> No, it doesn't and that's the point in having those return values :-) >> >> Even though the encoder/decoders are stateless, that doesn't mean >> they have to consume all input data. The caller is responsible to >> make sure that all input data was in fact consumed. >> >> You could for example have a decoder that stops decoding after >> having seen a block end indicator, e.g. a base64 line end or >> XML closing element. > > So how should the UTF-8 decoder know that it has to stop at a closing > XML element? The UTF-8 decoder doesn't support this, but you could write a codec that applies this kind of detection, e.g. to not try to decode partial UTF-8 byte sequences at the end of input, which would then result in error. >> Just because all codecs that ship with Python always try to decode >> the complete input doesn't mean that the feature isn't being used. > > I know of no other code that does. Do you have an example for this use. I already gave you a few examples. >> The interface was designed to allow for the above situations. > > Then could we at least have a new codec method that does: > > def statelesencode(self, input): > (output, consumed) = self.encode(input) > assert len(input) == consumed > return output You mean as method to the Codec class ? Sure, we could do that, but please use a different name, e.g. .encodeall() and .decodeall() - .encode() and .decode() are already stateles (and so would the new methods be), so "stateless" isn't all that meaningful in this context. We could also add such a check to the PyCodec_Encode() and _Decode() functions. They currently do not apply the above check. In Python, those two functions are exposed as codecs.encode() and codecs.decode(). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 13 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 23 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4