Hi Walter, I don't have time to comment on this this week, I'll respond next week. Overall, I don't like the idea of adding extra APIs breaking the existing codec API. I believe that we can extend stream codecs to also work in a feed mode without breaking the API. Walter Dörwald wrote: > OK, here a my current thoughts on the codec problem: > > The optimal solution (ignoring backwards compatibility) > would look like this: codecs.lookup() would return the > following stuff (this could be done by replacing the > 4 entry tuple with a real object): > > * decode: The stateless decoding function > * encode: The stateless encocing function > * chunkdecoder: The stateful chunk decoder > * chunkencoder: The stateful chunk encoder > * streamreader: The stateful stream decoder > * streamwriter: The stateful stream encoder > > The functions and classes look like this: > > > Stateless decoder: > decode(input, errors='strict'): > Function that decodes the (str) input object and returns > a (unicode) output object. The decoder must decode the > complete input without any remaining undecoded bytes. > > Stateless encoder: > encode(input, errors='strict'): > Function that encodes the complete (unicode) input object and > returns a (str) output object. > > Stateful chunk decoder: > chunkdecoder(errors='strict'): > A factory function that returns a stateful decoder with the > following method: > > decode(input, final=False): > Decodes a chunk of input and return the decoded unicode > object. This method can be called multiple times and > the state of the decoder will be kept between calls. > This includes trailing incomplete byte sequences > that will be retained until the next call to decode(). > When the argument final is true, this is the last call > to decode() and trailing incomplete byte sequences will > not be retained, but a UnicodeError will be raised. > > Stateful chunk encoder: > chunkencoder(errors='strict'): > A factory function that returns a stateful encoder > with the following method: > encode(input, final=False): > Encodes a chunk of input and returns the encoded > str object. When final is true this is the last > call to encode(). > > Stateful stream decoder: > streamreader(stream, errors='strict'): > A factory function that returns a stateful decoder > for reading from the byte stream stream, with the > following methods: > > read(size=-1, chars=-1, final=False): > Read unicode characters from the stream. When data > is read from the stream it should be done in chunks of > size bytes. If size == -1 all the remaining data > from the stream is read. chars specifies the number > of characters to read from the stream. read() may return > less then chars characters if there's not enough data > available in the byte stream. If chars == -1 as much > characters are read as are available in the stream. > Transient errors are ignored and trailing incomplete > byte sequence are retained when final is false. Otherwise > a UnicodeError is raised in the case of incomplete byte > sequences. > readline(size=-1): > ... > next(): > ... > __iter__(): > ... > > Stateful stream encoder: > streamwriter(stream, errors='strict'): > A factory function that returns a stateful encoder > for writing unicode data to the byte stream stream, > with the following methods: > > write(data, final=False): > Encodes the unicode object data and writes it > to the stream. If final is true this is the last > call to write(). > writelines(data): > ... > > > I know that this is quite a departure from the current API, and > I'm not sure if we can get all of the functionality without > sacrificing backwards compatibility. > > I don't particularly care about the "bytes consumed" return value > from the stateless codec. The codec should always have returned only > the encoded/decoded object, but I guess fixing this would break too > much code. And users who are only interested in the stateless > functionality will probably use unicode.encode/str.decode anyway. > > For the stateful API it would be possible to combine the chunk and > stream decoder/encode into one class with the following methods > (for the decoder): > > __init__(stream, errors='strict'): > Like the current StreamReader constructor, but stream may be > None, if only the chunk API is used. > decode(input, final=False): > Like the current StreamReader (i.e. it returns a (unicode, int) > tuple.) This does not keep the remaining bytes in a buffer. > This is the job of the caller. > feed(input, final=False): > Decodes input and returns a decoded unicode object. This method > calls decode() internally and manages the byte buffer. > read(size=-1, chars=-1, final=False): > readline(size=-1): > next(): > __iter__(): > See above. > > As before implementers of decoders only need to implement decode(). > > To be able to support the final argument the decoding functions > in _codecsmodule.c could get an additional argument. With this > they could be used for the stateless codecs too and we can reduce > the number of functions again. > > Unfortunately adding the final argument breaks all of the current > codecs, but dropping the final argument requires one of two > changes: > 1) When the input stream is exhausted, the bytes read are parsed > as if final=True. That's the way the CJK codecs currently > handle it, but unfortunately this doesn't work with the feed > decoder. > 2) Simply ignore any remaing undecoded bytes at the end of the > stream. > > If we really have to drop the final argument, I'd prefer 2). > > I've uploaded a second version of the patch. It implements > the final argument, adds the feed() method to StreamReader and > again merges the duplicate decoding functions in the codecs > module. Note that the patch isn't really finished (the final > argument isn't completely supported in the encoders and the > CJK and escape codecs are unchanged), but it should be sufficient > as a base for discussion. > > Bye, > Walter Dörwald > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/mal%40egenix.com -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 12 2004) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4