Issue 73: problem with reading from stdin http://code.google.com/p/html5lib/issues/detail?id=73
New issue report by arthurdejong: I have tried using pisa which uses html5lib to convert a HTML document to PDF. The HTMl content is read from stdin and passed to html5lib. The problem is that html5lib tries to do seek() in inputstream.py on such streams and fails. The failure of seeks seems to be silently ignored. On Solaris 8, Python 2.4.4, html5lib-0.11.1-py2.4.egg the following shell snippet demonstrates the problem, this works: pisa - - < /tmp/test.html > /tmp/test2.pdf because stdin is a file and doing seek is valid, but this fails: cat /tmp/test.html | pisa - - > /tmp/test3.pdf It seems to first to try to detect the encoding, seek back to the start and parse again. Explicitly setting the encoding worked for me but won't always be an option. Also (for the benefit of users that happen to run in to this in combination with pisa) pisa 3.0.22 won't work together with html5lib 0.11.1 (does with 0.10) and fails with the following error: AttributeError: 'module' object has no attribute 'isValidEncoding' An ugly workaround for this is to put the following in the start of the pisa script: import html5lib.inputstream html5lib.inputstream.isValidEncoding = html5lib.inputstream.codecName Thanks. Issue attributes: Status: New Owner: ---- Labels: Type-Defect Priority-Medium -- You received this message because you are listed in the owner or CC fields of this issue, or because you starred this issue. You may adjust your issue notification preferences at: http://code.google.com/hosting/settings --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to html5lib-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4