Jeroen Ruigrok van der Werven <asmodai <at> in-nomine.org> writes: > > So on medium and large datasets the decoder of Bjoern is very interesting, > but the tiny case (just Bjoern's name) is quite a tad bit slower. The other > cases seems more typical of what the average use in Python would be. Keep in mind what the datasets are: « The large buffer is a April 2009 Hindi Wikipedia article XML dump, the medium buffer Markus Kuhn's UTF-8-demo.txt, and the tiny buffer my name » It would be interesting to test with mostly ASCII data to see what that gives. Now the good thing is that, even with wildly non-ASCII data, our current decoder is very efficient. Regards Antoine.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4