Are you opposed changing tokenize? If so, why (apart from compatibility)? ISTM that it would be a good thing if it reported everything except horizontal whitespace. On 11/30/06, Phillip J. Eby <pje at telecommunity.com> wrote: > At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote: > >I've got a small tweak to tokenize.py that I'd like to run by folks here. > > > >I'm working on a refactoring tool for Python 2.x-to-3.x conversion, > >and my approach is to build a full parse tree with annotations that > >show where the whitespace and comments go. I use the tokenize module > >to scan the input. This is nearly perfect (I can render code from the > >parse tree and it will be an exact match of the input) except for > >continuation lines -- while the tokenize gives me pseudo-tokens for > >comments and "ignored" newlines, it doesn't give me the backslashes at > >all (while it does give me the newline following the backslash). > > The following routine will render a token stream, and it automatically > restores the missing \'s. I don't know if it'll work with your patch, but > perhaps you could use it instead of changing tokenize. For the > documentation and examples, see: > > http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text > > > def detokenize(tokens, indent=0): > """Convert `tokens` iterable back to a string.""" > out = []; add = out.append > lr,lc,last = 0,0,'' > baseindent = None > for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens): > # Insert trailing line continuation and blanks for skipped lines > lr = lr or sr # first line of input is first line of output > if sr>lr: > if last: > if len(last)>lc: > add(last[lc:]) > lr+=1 > if sr>lr: > add(' '*indent + '\\\n'*(sr-lr)) # blank continuation lines > lc = 0 > > # Re-indent first token on line > if lc==0: > if tok==INDENT: > continue # we want to dedent first actual token > else: > curindent = len(line[:sc].expandtabs()) > if baseindent is None and tok not in WHITESPACE: > baseindent = curindent > elif baseindent is not None and curindent>=baseindent: > add(' ' * (curindent-baseindent)) > if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE): > add(' ' * indent) > > # Not at start of line, handle intraline whitespace by retaining it > elif sc>lc: > add(line[lc:sc]) > > if val: > add(val) > > lr,lc,last = er,ec,line > > return ''.join(out) > > -- --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4