Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > The "right" thing to do here, is to simply remove cp875 > > from the test for round-tripping. > > I'm relieved you think so, since that's what I already did <wink>. > > > It is not the only encoding which fails this test, but it's not > > our fault: the codecs were all generated from the original codec > > maps at the Unicode.org site. > > > > If their mappings are broken, we can't do much about it... other > > than to ignore the error or remove the codec altogether. > > On general principle I don't like either of those -- "in the face of > ambiguity, refuse the temptation to guess". It's at least surprising to see > > >>> unicode("?", "cp875").encode("cp875") > '\xfd' > >>> > > now, yes? Would it be better if an ambiguous encoding raised an exception in > "strict" mode? That is, a third choice is to alert users when they're > relying on a broken part of a mapping. The problem is: which part would raise the exception -- the encoder or the decoder ? Here are some more options: * sort the items before creating the encoding table from the decoding one (makes the mapping stable) * map keys which have multiple mappings in the encoding table to None -- this causes their usage to raise an exception (undefined mapping) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4