Status: Accepted Owner: geoffers Labels: Type-Defect Python
New issue 202 by geoffers: Unicode file breaks InputStream http://code.google.com/p/html5lib/issues/detail?id=202 What steps will reproduce the problem?
import html5lib, StringIO html5lib.parse(StringIO.StringIO(u"a"))
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "html5lib/html5parser.py", line 54, in parse return p.parse(doc, encoding=encoding) File "html5lib/html5parser.py", line 247, in parse parseMeta=parseMeta, useChardet=useChardet) File "html5lib/html5parser.py", line 110, in _parse parser=self, **kwargs) File "html5lib/tokenizer.py", line 42, in __init__ self.stream = HTMLInputStream(stream, encoding, parseMeta, useChardet) File "html5lib/inputstream.py", line 162, in __init__ self.charEncoding = self.detectEncoding(parseMeta, chardet) File "html5lib/inputstream.py", line 217, in detectEncoding encoding = self.detectBOM() File "html5lib/inputstream.py", line 282, in detectBOM assert isinstance(string, str) AssertionErrorIn short, we don't handle the case where we get given a file-like object that returns Unicode strings.
-- You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send an email to html5lib-discuss@googlegroups.com. To unsubscribe from this group, send email to html5lib-discuss+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4