A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00322.html below:

'ascii' codec instead of appropriate one.

Status: New
Owner: ----

New issue 98 by nikolay.panov: Encoding issue: 'ascii' codec instead of  
appropriate one.
http://code.google.com/p/html5lib/issues/detail?id=98
This issue is related with the following sentence in the docs: "If no
encoding can be found and the chardet library is available, an attempt will
be made to sniff the encoding from the byte pattern "

    * What steps will reproduce the problem?

>>> html=fetch_url('http://www.ixbt.com/news/soft/index.shtml?11/72/39')
>>> p =
html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder("beautifulsoup"))
>>> soup = p.parse(html)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/home/niksite/lib/site-python/html5lib/html5parser.py", line 177,
in parse
     self._parse(stream, innerHTML=False, encoding=encoding)
   File "/home/niksite/lib/site-python/html5lib/html5parser.py", line 93, in
_parse
     self.mainLoop()
   File "/home/niksite/lib/site-python/html5lib/html5parser.py", line 149,
in mainLoop
     self.phase.processStartTag(token["name"], token["data"])
   File "/home/niksite/lib/site-python/html5lib/html5parser.py", line 314,
in processStartTag
     self.startTagHandler[name](name, attributes)
   File "/home/niksite/lib/site-python/html5lib/html5parser.py", line 605,
in startTagMeta
     data = inputstream.EncodingBytes(attributes["content"])
UnicodeEncodeError: 'ascii' codec can't encode characters in position
12-18: ordinal not in range(128)
>>> chardet.detect(html)
{'confidence': 0.94890270449856784, 'encoding': 'windows-1251'}


    * What is the expected output? What do you see instead?

As we can see, chardet successfully detect the 'windows-1251' encoding of
the html document provided.
Why html5lib try to use 'ascii' codec?


--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to html5lib-discuss@googlegroups.com
 To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4