[/F] > reverse sorting makes sense to me. but the cp-files appear to be > machine generated, so patching that python file won't help. Agreed. > a truly future-proof solution would be to specify exactly how to > resolve every many-to-one mapping, for every font having that > problem. but sorting them is clearly better than relying on > implementation-dependent behaviour... The attached program suggests the problem is rare; of those encoding files that have a Python decode_map dict, only these triggered a meaningful ambiguity complaint: *** cp1006.py maps 0xfe8e back to 0xb1, 0xb2 *** cp875.py maps 0x1a back to 0x3f, 0xdc, 0xe1, 0xec, 0xed, 0xfc, 0xfd Then since test_unicode only checks for roundtrip across range(0x80), cp875 is the only one that *can* fail (the ambiguities in cp1006 are for points > 0x7f, so aren't tested here). Hmm! Now I see that in a part of test_unicode that wasn't reached, cp875 and cp1006 are excluded, with this comment: ### These fail the round-trip: #'cp1006', 'cp875', 'iso8859_8', So the practical hack for now is to exclude cp875 from the earlier range(128) roundtrip test too. > (is Jython using exactly the same hashing and dictionary algorithms as > CPython? or does it work by accident also under Jython?) Sorry, no idea. Attempting to browse the Jython source on SourceForge caused this cute behavior: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/Lib/ Python Exception Occurred Traceback (innermost last): File "/usr/lib/cgi-bin/viewcvs.cgi", line 2286, in ? main() File "/usr/lib/cgi-bin/viewcvs.cgi", line 2253, in main view_directory(request) File "/usr/lib/cgi-bin/viewcvs.cgi", line 1043, in view_directory fileinfo, alltags = get_logs(full_name, rcs_files, view_tag) File "/usr/lib/cgi-bin/viewcvs.cgi", line 987, in get_logs raise 'error during rlog: '+hex(status) error during rlog: 0x100 let's-rewrite-it-in-php<wink>-ly y'rs - tim ENCODING_DIR = "../Lib/encodings" import os import imp def d(w): if type(w) is type(6): return hex(w) else: return repr(w) encfiles = [name for name in os.listdir(ENCODING_DIR) if name.endswith(".py") and name[0] != "_"] for fname in encfiles: path = os.path.join(ENCODING_DIR, fname) f = open(path) module = imp.load_source(fname[:-3], path, f) f.close() decode = getattr(module, "decoding_map", None) if decode is None: print fname, "doesn't have decoding_map." continue vtok = {} for k, v in decode.items(): if v in vtok: vtok[v].append(k) else: vtok[v] = [k] ambiguous = [(v, ks) for v, ks in vtok.items() if len(ks) > 1] if ambiguous: for v, ks in ambiguous: ks.sort() print "***", fname, "maps", d(v), "back to", \ ", ".join(map(d, ks)) else: print fname, "is free of ambiguous reverse maps."
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4