On 17.03.2016 15:55, Guido van Rossum wrote: > On Thu, Mar 17, 2016 at 5:04 AM, Serhiy Storchaka <storchaka at gmail.com> wrote: >>> Should we recommend that everyone use tokenize.detect_encoding()? >> >> Likely. However the interface of tokenize.detect_encoding() is not very >> simple. > > I just found that out yesterday. You have to give it a readline() > function, which is cumbersome if all you have is a (byte) string and > you don't want to split it on lines just yet. And the readline() > function raises SyntaxError when the encoding isn't right. I wish > there were a lower-level helper that just took a line and told you > what the encoding in it was, if any. Then the rest of the logic can be > handled by the caller (including the logic of trying up to two lines). I've uploaded the code I posted yesterday, modified to address some of the issues it had to github: https://github.com/malemburg/python-snippets/blob/master/detect_source_encoding.py I'm pretty sure the two-lines read can be optimized away and put straight into the regular expression used for matching. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 18 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2016-03-07: Released eGenix pyOpenSSL 0.13.14 ... http://egenix.com/go89 2016-02-19: Released eGenix PyRun 2.1.2 ... http://egenix.com/go88 ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4