A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2008-April/078814.html below:

[Python-Dev] Encoding detection in the standard library?

[Python-Dev] Encoding detection in the standard library?Mike Klaas mike.klaas at gmail.com
Tue Apr 22 22:46:26 CEST 2008
On 22-Apr-08, at 3:31 AM, M.-A. Lemburg wrote:
>>>
>> I don't think that should be part of the standard library. People
>> will mistake what it tells them for certain.
>
> +1
>
> I also think that it's better to educate people to add (correct)
> encoding information to their text data, rather than give them a
> guess mechanism...

That is a fallacious alternative: the programmers that need encoding  
detection are not the same people who are omitting encoding information.

I only have a small opinion on whether charset detection should appear  
in the stdlib, but I am somewhat perplexed by the arguments in this  
thread.  I don't see how inclusion in the stdlib would make people  
more inclined to think that the algorithm is always correct.  In terms  
of the need of this functionality:

Martin wrote:
> Can you please explain why that is? Web programs should not normally
> have the need to detect the encoding; instead, it should be specified
> always - unless you are talking about browsers specifically, which
> need to support web pages that specify the encoding incorrectly.

Any program that needs to examine the contents of documents/feeds/ 
whatever on the web needs to deal with incorrectly-specified encodings  
(which, sadly, is rather common).  The set of programs of programs  
that need this functionality is probably the same set that needs  
BeautifulSoup--I think that set is larger than just browsers <grin>

-Mike
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4