Showing content from http://mail.python.org/pipermail/python-dev/attachments/20140604/45a0203d/attachment.html below:
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#330033">
<div class="moz-cite-prefix">On 6/4/2014 5:08 PM, Glenn Linderman
wrote:<br>
</div>
<blockquote cite="mid:538FB501.2040601@g.nevcal.com" type="cite">
<div class="moz-cite-prefix">On 6/4/2014 5:03 PM, Greg Ewing
wrote:<br>
</div>
<blockquote cite="mid:538FB3C5.6010104@canterbury.ac.nz"
type="cite">Serhiy Storchaka wrote: <br>
<blockquote type="cite">html.HTMLParser, json.JSONDecoder,
re.compile, tokenize.tokenize don't use iterators. They use
indices, str.find and/or regular expressions. Common use case
is quickly find substring starting from current position using
str.find or re.search, process found token, advance position
and repeat. <br>
</blockquote>
<br>
For that kind of thing, you don't need an actual character <br>
index, just some way of referring to a place in a string. <br>
</blockquote>
<br>
I think you meant codepoint index, rather than character index.<br>
<br>
<blockquote cite="mid:538FB3C5.6010104@canterbury.ac.nz"
type="cite"> <br>
Instead of an integer, str.find() etc. could return a <br>
StringPosition, which would be an opaque reference to a <br>
particular point in a particular string. You would be <br>
able to pass StringPositions to indexing and slicing <br>
operations to get fast indexing into the string that <br>
they were derived from. <br>
<br>
StringPositions could support the following operations: <br>
<br>
  StringPosition + int --> StringPosition <br>
  StringPosition - int --> StringPosition <br>
  StringPosition - StringPosition --> int <br>
<br>
These would be computed by counting characters forwards <br>
or backwards in the string, which would be slower than <br>
int arithmetic but still faster than counting from the <br>
beginning of the string every time. <br>
<br>
In other contexts, StringPositions would coerce to ints <br>
(maybe being an int subclass?) allowing them to be used <br>
in any existing algorithm that slices strings using ints. <br>
<br>
</blockquote>
This starts to diverge from Python codepoint indexing via
integers. Calculating or caching the codepoint index to byte
offset as part of the str implementation stays compatible with
Python. Introducing StringPosition makes a Python-like language.
Or so it seems to me.</blockquote>
<br>
Another thought is that StringPosition only works (quickly, at
least), as you point out, for the string that they were derived
from... so algorithms that walk two strings at a time cannot use the
same StringPosition to do so... yep, this is quite divergent from
CPython and Python.<br>
</body>
</html>
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4