A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00197.html below:

problem with reading from stdin

Issue 73: problem with reading from stdin
http://code.google.com/p/html5lib/issues/detail?id=73
New issue report by arthurdejong:
I have tried using pisa which uses html5lib to convert a HTML document to
PDF. The HTMl content is read from stdin and passed to html5lib.

The problem is that html5lib tries to do seek() in inputstream.py on such
streams and fails. The failure of seeks seems to be silently ignored.

On Solaris 8, Python 2.4.4, html5lib-0.11.1-py2.4.egg the following shell
snippet demonstrates the problem, this works:

  pisa - - < /tmp/test.html > /tmp/test2.pdf

because stdin is a file and doing seek is valid, but this fails:

  cat /tmp/test.html | pisa - - > /tmp/test3.pdf

It seems to first to try to detect the encoding, seek back to the start and
parse again. Explicitly setting the encoding worked for me but won't always
be an option.

Also (for the benefit of users that happen to run in to this in combination
with pisa) pisa 3.0.22 won't work together with html5lib 0.11.1 (does with
0.10) and fails with the following error:

AttributeError: 'module' object has no attribute 'isValidEncoding'

An ugly workaround for this is to put the following in the start of the
pisa script:

import html5lib.inputstream
html5lib.inputstream.isValidEncoding = html5lib.inputstream.codecName

Thanks.


Issue attributes:
        Status: New
        Owner: ----
        Labels: Type-Defect Priority-Medium

-- 
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to html5lib-discuss@googlegroups.com
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4