Guido van Rossum wrote: > > > Like a path of search functions ? Not a bad idea... I will still > > want the internal dict for caching purposes though. I'm not sure > > how often these encodings will be, but even a few hundred function > > call will slow down the Unicode implementation quite a bit. > > Of course. (It's like sys.modules caching the results of an import). I've fixed the "path of search functions" approach in the latest version of the spec. > [...] > > def flush(self): > > > > """ Flushed the codec buffers used for keeping state. > > > > Returns values are not defined. Implementations are free to > > return None, raise an exception (in case there is pending > > data in the buffers which could not be decoded) or > > return any remaining data from the state buffers used. > > > > """ > > I don't know where this came from, but a flush() should work like > flush() on a file. It came from Fredrik's proposal. > It doesn't return a value, it just sends any > remaining data to the underlying stream (for output). For input it > shouldn't be supported at all. > > The idea is that flush() should do the same to the encoder state that > close() followed by a reopen() would do. Well, more or less. But if > the process were to be killed right after a flush(), the data written > to disk should be a complete encoding, and not have a lingering shift > state. Ok. I've modified the API as follows: StreamWriter: def flush(self): """ Flushes and resets the codec buffers used for keeping state. Calling this method should ensure that the data on the output is put into a clean state, that allows appending of new fresh data without having to rescan the whole stream to recover state. """ pass StreamReader: def read(self,chunksize=0): """ Decodes data from the stream self.stream and returns a tuple (Unicode object, bytes consumed). chunksize indicates the approximate maximum number of bytes to read from the stream for decoding purposes. The decoder can modify this setting as appropriate. The default value 0 indicates to read and decode as much as possible. The chunksize is intended to prevent having to decode huge files in one step. The method should use a greedy read strategy meaning that it should read as much data as is allowed within the definition of the encoding and the given chunksize, e.g. if optional encoding endings or state markers are available on the stream, these should be read too. """ ... the base class should provide a default implementation of this method using self.decode ... def reset(self): """ Resets the codec buffers used for keeping state. Note that no stream repositioning should take place. This method is primarely intended to recover from decoding errors. """ pass The .reset() method replaces the .flush() method on StreamReaders. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 42 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4