Fredrik Lundh wrote: > > M.-A. Lemburg <mal@lemburg.com> wrote: > > > def flush(self): > > > # flush the decoding buffers. this should usually > > > # return None, unless the fact that knowing that the > > > # input stream has ended means that the state can be > > > # interpreted in a meaningful way. however, if the > > > # state indicates that there last character was not > > > # finished, this method should raise a UnicodeError > > > # exception. > > > > Could you explain for reason for having a .flush() method > > and what it should return. > > in most cases, it should either return None, or > raise a UnicodeError exception: > > >>> u = unicode("å i åa ä e ö", "iso-latin-1") > >>> # yes, that's a valid Swedish sentence ;-) > >>> s = u.encode("utf-8") > >>> d = decoder("utf-8") > >>> d.decode(s[:-1]) > "å i åa ä e " > >>> d.flush() > UnicodeError: last character not complete > > on the other hand, there are situations where it > might actually return a string. consider a "HTML > entity decoder" which uses the following pattern > to match a character entity: "&\w+;?" (note that > the trailing semicolon is optional). > > >>> u = unicode("å i åa ä e ö", "iso-latin-1") > >>> s = u.encode("html-entities") > >>> d = decoder("html-entities") > >>> d.decode(s[:-1]) > "å i åa ä e " > >>> d.flush() > "ö" Ah, ok. So the .flush() method checks for proper string endings and then either returns the remaining input or raises an error. > > Perhaps I'm missing something, but how would you define > > stream codecs using this interface ? > > input: read chunks of data, decode, and > keep extra data in a local buffer. > > output: encode data into suitable chunks, > and write to the output stream (that's why > there's a buffersize argument to encode -- > if someone writes a 10mb unicode string to > an encoded stream, python shouldn't allocate > an extra 10-30 megabytes just to be able to > encode the darn thing...) So the stream codecs would be wrappers around the string codecs. Have you read my latest version of the Codec interface ? Wouldn't that be a reasonable approach ? Note that I have integrated your ideas into the new API -- it's basically only missing the .flush() methods, which I can add now that I know what you meant. > > > Implementing stream codecs is left as an exercise (see the zlib > > > material in the eff-bot guide for a decoder example). > > everybody should have a copy of the eff-bot guide ;-) Sure, but the format, the format... make it printed and add a CD and you would probably have a good selling book there ;-) > (but alright, I plan to post a complete utf-8 implementation > in a not too distant future). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 43 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4