On 9/1/2011 2:15 AM, Stephen J. Turnbull wrote: > Glenn Linderman writes: > > > How many different iterators into the same text would be concurrently > > needed by an application? And why? > > A WYSIWYG editor for structured text (TeX, HTML) might want two (at > least), one for the "source" window and one for the "rendered" window. > One might want to save the state of the iterators (if that's possible) > and cache it as one moves the "window" forward to make short backward > motion fast, giving you two (or four, etc) more. Sure. But those are probably all the same type of iterators — probably (since they are WYSIWYG) dealing with multi-codepoint characters (Guido's recent definition of grapheme, which seems to subsume both grapheme clusters and composed characters). Hence all of them would be using/requiring the same sort of representation, index, analysis, or some combination of those. > > Seems like if it is dealing with text at the level of grapheme > > clusters, it needs that type of iterator. Of course, if it does > > I/O it needs codec access, but that is by nature sequential from > > the starting point to the end point. > > `save-region' ? `save-text-remove-markup' ? Yes, save-region sounds like exactly what I was speaking of. save-text-remove-markup I would infer needs to process the text to remove the markup characters... since you used TeX and HTML as examples, markup is text, not binary (which would be a different problem). Since the TeX and HTML markup is mostly ASCII, markup removal (or more likely, text extraction) could be performed via either a grapheme iterator, or a codepoint iterator, or even a code unit iterator. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/95124888/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4