Jeroen Ruigrok van der Werven <asmodai <at> in-nomine.org> writes: > > This got posted on the Unicode list, does it seem interesting for Python > itself, the UTF-8 to UTF-16 transcoding might be? > > http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ If you have some time on your hands, you could try benchmarking it against Python 3.1's (py3k) decoder. There are two cases to consider: - mostly non-ASCII input, such as the "utf-8 demo" file mentioned in the page above - mostly ASCII input, such as will happen very often (think HTML, XML, log files, etc.) The py3k utf-8 decoder is optimized for the latter. Regards Antoine.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4