Here is a new proposal for the codec interface: class Codec: def encode(self,u,slice=None): """ Return the Unicode object u encoded as Python string. If slice is given (as slice object), only the sliced part of the Unicode object is encoded. The method may not store state in the Codec instance. Use SteamCodec for codecs which have to keep state in order to make encoding/decoding efficient. """ ... def decode(self,s,slice=None): """ Return an equivalent Unicode object for the encoded Python string s. If slice is given (as slice object), only the sliced part of the Python string is decoded and returned as Unicode object. Note that this can cause the decoding algorithm to fail due to truncations in the encoding. The method may not store state in the Codec instance. Use SteamCodec for codecs which have to keep state in order to make encoding/decoding efficient. """ ... class StreamCodec(Codec): def __init__(self,stream=None,errors='strict'): """ Creates a StreamCodec instance. stream must be a file-like object open for reading and/or writing binary data depending on the intended codec action or None. The StreamCodec may implement different error handling schemes by providing the errors argument. These parameters are known (they need not all be supported by StreamCodec subclasses): 'strict' - raise an UnicodeError (or a subclass) 'ignore' - ignore the character and continue with the next (a single character) - replace errorneous characters with the given character (may also be a Unicode character) """ self.stream = stream def write(self,u,slice=None): """ Writes the Unicode object's contents encoded to self.stream. stream must be a file-like object open for writing binary data. If slice is given (as slice object), only the sliced part of the Unicode object is written. """ ... the base class should provide a default implementation of this method using self.encode ... def read(self,length=None): """ Reads an encoded string from the stream and returns an equivalent Unicode object. If length is given, only length Unicode characters are returned (the StreamCodec instance reads as many raw bytes as needed to fulfill this requirement). Otherwise, all available data is read and decoded. """ ... the base class should provide a default implementation of this method using self.decode ... It is not required by the unicodec.register() API to provide a subclass of these base class, only the given methods must be present; this allows writing Codecs as extensions types. All Codecs must provide the .encode()/.decode() methods. Codecs having the .read() and/or .write() methods are considered to be StreamCodecs. The Unicode implementation will by itself only use the stateless .encode() and .decode() methods. All other conversion have to be done by explicitly instantiating the appropriate [Stream]Codec. -- Feel free to beat on this one ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 45 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4