M.-A. Lemburg writes: > The problem is that the encoding names are not Python identifiers, > e.g. iso-8859-1 is allowed as identifier. This and > the fact that applications may want to ship their own codecs (which > do not get installed under the system wide encodings package) > make the registry necessary. This isn't a substantial problem. Try this on for size (probably not too different from what everyone is already thinking, but let's make it clear). This could be in encodings/__init__.py; I've tried to be really clear on the names. (No testing, only partially complete.) ------------------------------------------------------------------------ import string import sys try: from cStringIO import StringIO except ImportError: from StringIO import StringIO class EncodingError(Exception): def __init__(self, encoding, error): self.encoding = encoding self.strerror = "%s %s" % (error, `encoding`) self.error = error Exception.__init__(self, encoding, error) _registry = {} def registerEncoding(encoding, encode=None, decode=None, make_stream_encoder=None, make_stream_decoder=None): encoding = encoding.lower() if _registry.has_key(encoding): info = _registry[encoding] else: info = _registry[encoding] = Codec(encoding) info._update(encode, decode, make_stream_encoder, make_stream_decoder) def getCodec(encoding): encoding = encoding.lower() if _registry.has_key(encoding): return _registry[encoding] # load the module modname = "encodings." + encoding.replace("-", "_") try: __import__(modname) except ImportError: raise EncodingError("unknown uncoding " + `encoding`) # if the module registered, use the codec as-is: if _registry.has_key(encoding): return _registry[encoding] # nothing registered, use well-known names module = sys.modules[modname] codec = _registry[encoding] = Codec(encoding) encode = getattr(module, "encode", None) decode = getattr(module, "decode", None) make_stream_encoder = getattr(module, "make_stream_encoder", None) make_stream_decoder = getattr(module, "make_stream_decoder", None) codec._update(encode, decode, make_stream_encoder, make_stream_decoder) class Codec: __encode = None __decode = None __stream_encoder_factory = None __stream_decoder_factory = None def __init__(self, name): self.name = name def encode(self, u): if self.__stream_encoder_factory: sio = StringIO() encoder = self.__stream_encoder_factory(sio) encoder.write(u) encoder.flush() return sio.getvalue() else: raise EncodingError("no encoder available for " + `self.name`) # similar for decode()... def make_stream_encoder(self, target): if self.__stream_encoder_factory: return self.__stream_encoder_factory(target) elif self.__encode: return DefaultStreamEncoder(target, self.__encode) else: raise EncodingError("no encoder available for " + `self.name`) # similar for make_stream_decoder()... def _update(self, encode, decode, make_stream_encoder, make_stream_decoder): self.__encode = encode or self.__encode self.__decode = decode or self.__decode self.__stream_encoder_factory = ( make_stream_encoder or self.__stream_encoder_factory) self.__stream_decoder_factory = ( make_stream_decoder or self.__stream_decoder_factory) ------------------------------------------------------------------------ > I don't see a problem with the registry though -- the encodings > package can take care of the registration process without any No problem at all; we just need to make sure the right magic is there for the "normal" case. > PS: we could probably even take the whole codec idea one step > further and also allow other input/output formats to be registered, File formats are different from text encodings, so let's keep them separate. Yes, a registry can be a good approach whenever the various things being registered are sufficiently similar semantically, but the behavior of the registry/lookup can be very different for each type of thing. Let's not over-generalize. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4