A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2009-April/089077.html below:

[Python-Dev] UTF-8 Decoder

[Python-Dev] UTF-8 Decoder [Python-Dev] UTF-8 DecoderAntoine Pitrou solipsis at pitrou.net
Mon Apr 27 20:48:38 CEST 2009
Jeroen Ruigrok van der Werven <asmodai <at> in-nomine.org> writes:
> 
> So on medium and large datasets the decoder of Bjoern is very interesting,
> but the tiny case (just Bjoern's name) is quite a tad bit slower. The other
> cases seems more typical of what the average use in Python would be.

Keep in mind what the datasets are:

« The large buffer is a April 2009 Hindi Wikipedia article XML dump, the medium
buffer Markus Kuhn's UTF-8-demo.txt, and the tiny buffer my name »

It would be interesting to test with mostly ASCII data to see what that gives.
Now the good thing is that, even with wildly non-ASCII data, our current decoder
is very efficient.

Regards

Antoine.


More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4