RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00586.html below:

Issue 202 in html5lib: Unicode file breaks InputStream

Status: Accepted
Owner: geoffers
Labels: Type-Defect Python

New issue 202 by geoffers: Unicode file breaks InputStream
http://code.google.com/p/html5lib/issues/detail?id=202

What steps will reproduce the problem?

import html5lib, StringIO
html5lib.parse(StringIO.StringIO(u"a"))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "html5lib/html5parser.py", line 54, in parse
    return p.parse(doc, encoding=encoding)
  File "html5lib/html5parser.py", line 247, in parse
    parseMeta=parseMeta, useChardet=useChardet)
  File "html5lib/html5parser.py", line 110, in _parse
    parser=self, **kwargs)
  File "html5lib/tokenizer.py", line 42, in __init__
    self.stream = HTMLInputStream(stream, encoding, parseMeta, useChardet)
  File "html5lib/inputstream.py", line 162, in __init__
    self.charEncoding = self.detectEncoding(parseMeta, chardet)
  File "html5lib/inputstream.py", line 217, in detectEncoding
    encoding = self.detectBOM()
  File "html5lib/inputstream.py", line 282, in detectBOM
    assert isinstance(string, str)
AssertionError

In short, we don't handle the case where we get given a file-like object that returns Unicode strings.

--
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss@googlegroups.com.
To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4