RetroSearch Browse

Thu Oct 16 23:11:40 CEST 2008 · https://mail.python.org/pipermail/python-dev/2008-October/083034.html

> Raymond Hettinger wrote:
>> * It will assist pypy style projects and other python implementations
>> when they have to build equivalents to CPython.
>>
>> * Will eliminate confusion about what functions were exactly intended to
>> do.
>>
>> * Will confer benefits similar to test driven development where the
>> documentation and  pure python version are developed first and doctests
>> gotten to pass, then the C version is created to match.
>
> I haven't seen anyone comment about this assertion of "equivalence".
> Doesn't it strike you as difficult to maintain *two* versions of every
> function, and ensure they match *exactly*?

Glad you brought this up.  My idea is to present rough equivalence
in unoptimized python that is simple and clear.  The goal is to provide
better documentation where code is more precise than English prose.
That being said, some subset of the existing tests should be runnable
against the rough equivalent and the python code should incorporate doctests.
Running both sets of test should suffice to maintain the rough equivalence.

The notion of exact equivalence should be left to PyPy folks who can attest
that the code can get convoluted when you try to simulate exactly when
error checking is performed, read-only behavior for attributes, and making
the stacktraces look the same when there are errors.  In contrast, my
goal is an approximation that is executable but highly readable and expository.

My thought is to do this only with tools where it really does enhance the
documentation.  The exercise is worthwhile in and of itself.  For example,
I'm working on a pure python version of str.split() and quickly determined
that the docs are *still* in error even after many revisions over the years
(the whitespace version does not, in fact, start by stripping whitespace
from both ends).  Here's what I have so far:

def split(s, sep=None, maxsplit=-1):
    """split(S, [sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are removed
    from the result.

    >>> from itertools import product
    >>> s = ' 11   2  333  4  '
    >>> split(s, None)
    ['11', '2', '333', '4']
    >>> n = 8
    >>> for s in product('ab ', repeat=n):
    ...     for maxsplit in range(-2, len(s)+2):
    ...         s = ''.join(s)
    ...         assert s.split(None, maxsplit) == split(s, None, maxsplit), namedtuple('Err', 'str maxsplit result target')(repr(s), 
maxsplit, split(s,None,maxsplit), s.split(None, maxsplit))

    """
    result = []
    spmode = True
    start = 0
    if maxsplit != 0:
        for i, c in enumerate(s):
            if spmode:
                if not c.isspace():
                    start = i
                    spmode = False
            elif c.isspace():
                result.append(s[start:i])
                start = i
                spmode = True
                if len(result) == maxsplit:
                    break
    rest = s[start:].lstrip()
    return (result + [rest]) if rest else result

Once I have the cleanest possible, self-explantory code that passes tests, I'll improve the variable names and make a more sensible 
docstring with readable examples.  Surprisingly, it hasn't been a trivial exercise to come-up with an equivalent that corresponds 
more closely to the way we think instead of corresponding the C code -- I want to show *what* is does more than *how* it does it.

Raymond

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2008-October/083034.html below:

[Python-Dev] Documentation idea