[Skip Montanaro] > The unicode() builtin accepts an optional third argument, errors, which > defaults to "strict". According to the docs if errors is set to "ignore", > decoding errors are silently ignored. I seem to still get the occasional > UnicodeError exception, however. I'm still trying to track down an actual > example (it doesn't happen often, and I hadn't wrapped unicode() in a > try/except statement, so all I saw was the error raised, not the input > string value). Play with this: """ def generrors(encoding, errors, maxlen, maxtries): from random import choice, randint bytes = [chr(i) for i in range(256)] paste = ''.join for dummy in xrange(maxtries): n = randint(1, maxlen) raw = paste([choice(bytes) for dummy in range(n)]) try: u = unicode(raw, encoding, errors) except UnicodeError, detail: print 'fail w/ errors', errors, '- raw data', repr(raw) print ' UnicodeError', str(detail) errors = ('strict', 'replace', 'ignore') generrors('mac-turkish', errors[2], 10, 1000) """ Plug in your favorite encoding and let it do the work of finding examples. It generates plenty of errors with 'strict', but so far I haven't seen it generate one with 'replace' or 'ignore'.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4