[CCing python-dev again] On 2008-04-22 12:38, Greg Wilson wrote: >>> I don't think that should be part of the standard library. People >>> will mistake what it tells them for certain. >>> [etc] > > These are all good arguments, but the fact remains that we can't control > our inputs (e.g., we're archiving mail messages sent to lists managed by > DrProject), and some of those inputs *don't* tell us how they're encoded. > Under those circumstances, what would you recommend? I haven't done much research into this, but in general, I think it's better to: * first try to look at other characteristics of a text message, e.g. language, origin, topic, etc., * then narrow down the number of encodings which could apply, * rank them to try to avoid ambiguities and * then try to see what percentage of the text you can decode using each of the encodings in reverse ranking order (ie. more specialized encodings should be tested first, latin-1 last). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 22 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4