On 22-Apr-08, at 3:31 AM, M.-A. Lemburg wrote: >>> >> I don't think that should be part of the standard library. People >> will mistake what it tells them for certain. > > +1 > > I also think that it's better to educate people to add (correct) > encoding information to their text data, rather than give them a > guess mechanism... That is a fallacious alternative: the programmers that need encoding detection are not the same people who are omitting encoding information. I only have a small opinion on whether charset detection should appear in the stdlib, but I am somewhat perplexed by the arguments in this thread. I don't see how inclusion in the stdlib would make people more inclined to think that the algorithm is always correct. In terms of the need of this functionality: Martin wrote: > Can you please explain why that is? Web programs should not normally > have the need to detect the encoding; instead, it should be specified > always - unless you are talking about browsers specifically, which > need to support web pages that specify the encoding incorrectly. Any program that needs to examine the contents of documents/feeds/ whatever on the web needs to deal with incorrectly-specified encodings (which, sadly, is rather common). The set of programs of programs that need this functionality is probably the same set that needs BeautifulSoup--I think that set is larger than just browsers <grin> -Mike
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4