Tim Peters wrote: > > [MAL] > > Round-tripping is obviously very important if you use Unicode > > as basis for working on text. > > Since I use 7-bit ASCII exclusively, I've been using > > encode = decode = lambda x: x > > I haven't proved that's round-trippable, but haven't bumped into an exception > yet. For character map codecs the complete range(256) of possible input characters should pass the round-trip test, that is encoded text -> Unicode -> encoded text should result in the identiy mapping for all c in map(chr,range(256)). > > I don't know about the reasoning behind making cp875 fail the > > round-trip -- Unicode certainly provides means to make mappings > > round-trip safe (e.g. by reverting to the private Unicode > > char. point areas). > > Then I ignorantly but confidently (indeed, with the cheery confidence only > the truly ignorant can truly enjoy!) vote for your approach that maps the > non-round-trippable cp875 code points to None. Better safe than sorry, by > default. Else 6 of the 7 ambiguous chars will be silent surprises by > default. I will check in a patch which moves the building logic for encoding maps to codecs.py. This will simplify the task of choosing the "right" solution. Currently I'm in favour of: def make_encoding_map(decoding_map): """ Creates an encoding map from a decoding map. If a target mapping in the decoding map occurrs multiple times, then that target is mapped to None (undefined mapping), causing an exception when encountered by the charmap codec during translation. One example where this happens is cp875.py which decodes multiple character to \u001a. """ m = {} for k,v in decoding_map.items(): if not m.has_key(v): m[v] = k else: m[v] = None return m Perhaps we should also have a codecs.finalize_decoding_map() API in codecs.py which checks the decoding map and postprocesses it in case it finds a problem ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4