Walter D=F6rwald <walter@livinglogic.de> writes: > Output is as follows: > 1790000 chars, 2.330% unenc > ignore: 0.022 (factor=3D1.000) > xmlcharrefreplace: 0.044 (factor=3D1.962) > xml2: 0.267 (factor=3D12.003) > xml3: 0.723 (factor=3D32.506) > workaround: 5.151 (factor=3D231.702) > i.e. a 1.7MB string with 2.3% unencodable characters was > encoded. Those numbers are impressive. Can you please add def xml4(exc): if isinstance(exc, UnicodeEncodeError): if exc.end-exc.start =3D=3D 1: return u"&#"+str(ord(exc.object[exc.start]))+u";" else: r =3D [] for c in exc.object[exc.start:exc.end]: r.extend([u"&#", str(ord(c)), u";"]) return u"".join(r) else: raise TypeError("don't know how to handle %r" % exc) and report how that performs (assuming I made no error)? > Using a callback instead of the inline implementation is a factor of > 12 slower than ignore. For the purpose of comparing C and Python, this isn't relevant, is it? Only the C version of xmlcharrefreplace and a Python version should be compared. > It can't really be fixed for codecs implemented in Python. For codecs > that use the C functions we could add the functionality that e.g. > PyUnicodeEncodeError_SetReason(exc) sets exc.reason and exc.args[3], > but AFAICT it can't be done easily for Python where attribute assignment > directly goes to the instance dict. You could add methods into the class set_reason etc, which error handler authors would have to use. Again, these methods could be added through Python code, so no C code would be necessary to implemenet them. You could even implement a setattr method in Python - although you'ld have to search this from C while initializing the class. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4