RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00134.html below:

two patches for HTMLInputStream (Python)

I have uploaded a patch to address the performance problem that excors
and graham described in http://code.google.com/p/html5lib/issues/detail?id=35.
The data structure (list) used to keep unread characters has O(n) cost
for insertion to/deletion from the head of the list, so frequent
unget() (which happens when parsing a long sequence of number
entities) can result in poor performance.  Changing the data structure
to deque, a queue object with O(1) insertion/deletion cost at the
head, improves the performance in general and makes char()/unget()
scales better with long input size.  deque is a python library object
in python 2.4 and after.

The running time of excors's code is still not linear to the input
size, but the additional overhead is mostly from appendChild() in
simpletree and is not related to HTMLInputStream.

I also started another issue (http://code.google.com/p/html5lib/issues/
detail?id=53) with a patch for an alternative implementation of
position() and lineLengths.  Follow the general idea in the comment, I
keep a buffer of read characters and compute the col, line,
lineLengths when the corresponding functions are called.  Even though
the overhead is large (a buffer of input size), but it should be
correct.   Removing position tracking from each call to char()/unget()/
charsUntil() also improve the performance.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to html5lib-discuss@googlegroups.com
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4