Status: New Owner: ---- New issue 104 by david.m.cooke: Optimisations to inputstream.py http://code.google.com/p/html5lib/issues/detail?id=104
As promised in my post to comp.lang.python (http://article.gmane.org/gmane.comp.python.general/629210), here is a patch to improve inputstream.py The .position property of EncodingBytes is used a lot. Every self.position +=1 calls getPosition() and setPosition(). Another getPosition() call is done in the self.currentByte property. Most of these can be optimised away by using methods that move the position and return the current byte. In HTMLInputStream, the current line number and column are updated every time a new character is read with .char(). The current position is *only* used in error reporting, so I reworked it to only calculate the position when .position() is called, by keeping track of the number of lines in previous read chunks, and computing the number of lines to the current offset in the current chunk. These changes give me about a 20% speedup on the tokenisation. (The patch also has a spelling correction of 'certian' to 'certain' for the encoding certainty) Attachments: inputstream-opt 16.0 KB -- You received this message because you are listed in the owner or CC fields of this issue, or because you starred this issue. You may adjust your issue notification preferences at: http://code.google.com/hosting/settings --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "html5lib-discuss" group. To post to this group, send email to html5lib-discuss@googlegroups.com To unsubscribe from this group, send email to html5lib-discuss+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/html5lib-discuss?hl=en-GB -~----------~----~----~----~------~----~------~--~---
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4