A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2009-October/093262.html below:

[Python-Dev] tokenize string literal problem

[Python-Dev] tokenize string literal problemC or L Smith smiles at worksmail.net
Sat Oct 24 08:04:25 CEST 2009
BACKGROUND
    I'm trying to modify the doctest DocTestParser so it will parse docstring code snippets out of a *py file. (Although doctest can parse these with another method out of *pyc, it is missing certain decorated functions and we would also like to insist of import of needed modules rather and that method automatically loads everything from the module containing the code.)

PROBLEM
    I need to find code snippets which are located in docstrings. Docstrings, being string literals should be able to be parsed out with tokenize. But tokenize is giving the wrong results (or I am doing something wrong) for this (pathological) case:

foo.py:
+----
def bar():
    """
    A quoted triple quote is not a closing
    of this docstring:
    >>> print '"""'
    """
    """ # <-- this is the closing quote
    pass
+----

Here is how I tokenize the file:

###
import re, tokenize
DOCSTRING_START_RE = re.compile('\s+[ru]*("""|' + "''')")

o=open('foo.py','r')
for ti in tokenize.generate_tokens(o.next):
    typ = ti[0]
    text = ti[-1]
    if typ == tokenize.STRING:
        if DOCSTRING_START_RE.match(text):
            print "DOCSTRING:",repr(text)
o.close()
###

which outputs:

DOCSTRING: '    """\n    A quoted triple quote is not a closing\n    of this docstring:\n    >>> print \'"""\'\n'
DOCSTRING: '    """\n    """ # <-- this is the closing quote\n'

There should be only one string tokenized, I believe. The PythonWin editor parses (and colorizes) this correctly, but tokenize (or I) are making an error.

Thanks for any help,
Chris
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4