Tim Peters wrote: > > [M.-A. Lemburg] > > The problem is: which part would raise the exception -- the > > encoder or the decoder ? > > Since I don't yet use any of this stuff for real, I have no idea: seems > mostly a question of pragmatics, and I don't have any feel for how cp875 > users would view it. If there are any... that code page dates back to 1996 and is based in the EBCDIC world. > > Here are some more options: > > > > * sort the items before creating the encoding table from the > > decoding one (makes the mapping stable) > > If users don't care that round-trip can fail silently, fine. > > > * map keys which have multiple mappings in the encoding table > > to None -- this causes their usage to raise an exception > > (undefined mapping) > > If users don't care that they'll get an exception when they try something > that can't be round-tripped, fine. Or would this depend on the value of the > "errors" argument too? Then it's easier to impose. The errors argument tells the codecs what to do in case a mapping fails (from codecs.py): The .encode()/.decode() methods may implement different error handling schemes by providing the errors argument. These string values are defined: 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the builtin Unicode codecs. 'strict' is the default for all operations that deal with auto- conversion. 'ignore' and 'replace' allow silently ignoring the problem. > There's a theme here <wink>: I have no idea how important roundtrip is in > Unicode Practice, or even that it's a constant across apps and encodings. If > I write a codec to map all ASCII consonants to u"k" and vowels to u"a", I > wouldn't care that I can't get "love" back from u"kaka" <wink>. Round-tripping is obviously very important if you use Unicode as basis for working on text. I don't know about the reasoning behind making cp875 fail the round-trip -- Unicode certainly provides means to make mappings round-trip safe (e.g. by reverting to the private Unicode char. point areas). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4