A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.mail-archive.com/html5lib-discuss@googlegroups.com/msg00352.html below:

Issue 104 in html5lib: Optimisations to inputstream.py

Status: New
Owner: ----

New issue 104 by david.m.cooke: Optimisations to inputstream.py
http://code.google.com/p/html5lib/issues/detail?id=104
As promised in my post to comp.lang.python
(http://article.gmane.org/gmane.comp.python.general/629210), here is a  
patch to improve
inputstream.py

The .position property of EncodingBytes is used a lot. Every
self.position +=1 calls getPosition() and setPosition(). Another
getPosition() call is done in the self.currentByte property. Most of
these can be optimised away by using methods that move the position
and return the current byte.

In HTMLInputStream, the current line number and column are updated
every time a new character is read with .char(). The current position
is *only* used in error reporting, so I reworked it to only calculate
the position when .position() is called, by keeping track of the
number of lines in previous read chunks, and computing the number of
lines to the current offset in the current chunk.

These changes give me about a 20% speedup on the tokenisation.

(The patch also has a spelling correction of 'certian' to 'certain' for the  
encoding certainty)

Attachments:
        inputstream-opt  16.0 KB

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to html5lib-discuss@googlegroups.com
 To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4