This is a multi-part message in MIME format. --------------4273B7E264E4649CF795A2CF Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > Paul Moore (in privte mail): > > You have methods for finding > the start and end of various <indextypes>, but you don't have a method for > finding the length of an <indextype>. In the case of words (which is the one > I understand :-), the length of a word is not the same as the difference > between the starts of consecutive words - the intervening whitespace should > be excluded (at least for some applications). I would suggest > > length_<indextype>(u, index) -> integer > Returns the length in Unicode objects of the <indextype> found at u[index] > or -1 in case u[index] is not in an element of this type (for example, in > the whitespace between words). [XXX Should this be the number of Unicode > objects between index and the end of the element, or should it be the length > from start to end even if you are in the middle?] > > or maybe better > > nextend_<indextype>(u, index) -> integer > Returns the Unicode object index for the end of the next <indextype> found > after u[index] or -1 in case no next element of this type exists. > > [But that runs into issues when you are in a word - If index is not the > first Unicode object, nextend is the end of *this* element, whereas next is > the start of the *next* element. I think I'm starting to show my > ignorance...] > > Even though I suspect my suggested methods are too simplistic, I'd suggest > at least a comment in the PEP on how to work out the length of the element > you're in (or why it's hard, and you'd never want to do it :-)... The two suggested APIs probe into the Unicode object. I think it would be more useful to return the slice (as slice object) which represents the <indextype> element found at the given index in u, e.g. <indextype>_slice(u, index) -> slice object or None Returns the slice pointing to the <indextype> element found in u at the given index or None in case no such element can be found at that position. Hmm, I wonder whether slice objects can be "applied" to sequences somehow... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ --------------4273B7E264E4649CF795A2CF Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Received: from gw-nl1.origin-it.com (gw-nl1.origin-it.com [193.79.128.34]) by www.egenix.com (8.11.2/8.11.2/SuSE Linux 8.11.1-0.5) with ESMTP id f6DCTY816219 for <mal@lemburg.com>; Fri, 13 Jul 2001 14:29:34 +0200 Received: from exchsmtp-nl1.origin-it.com (localhost.origin-it.com [127.0.0.1]) by gw-nl1.origin-it.com with ESMTP id OAA11738 for <mal@lemburg.com>; Fri, 13 Jul 2001 14:26:54 +0200 (MEST) (envelope-from Paul.Moore@atosorigin.com) Received: from exchsmtp-nl1.origin-it.com(172.16.127.66) by gw-nl1.origin-it.com via mwrap (4.0a) id xma011736; Fri, 13 Jul 01 14:26:54 +0200 Received: from mail.origin-it.com (mail.origin-it.com [172.16.127.3]) by exchsmtp-nl1.origin-it.com (8.9.3/8.8.5-1.2.2m-19990317) with ESMTP id OAA04126 for <mal@lemburg.com>; Fri, 13 Jul 2001 14:26:53 +0200 (MET DST) Received: from ukrax001.ras.uk.origin-it.com (ukrax001.ras.uk.origin-it.com [172.16.201.234]) by mail.origin-it.com (8.9.3/8.8.5-1.2.2m-19990317) with ESMTP id OAA12785 for <mal@lemburg.com>; Fri, 13 Jul 2001 14:26:53 +0200 (MET DST) Received: by ukrax001.ras.uk.origin-it.com with Internet Mail Service (5.5.2650.21) id <NBW9YQM2>; Fri, 13 Jul 2001 13:26:53 +0100 Message-ID: <714DFA46B9BBD0119CD000805FC1F53B01B5AEF5@ukrux002.rundc.uk.origin-it.com> From: "Moore, Paul" <Paul.Moore@atosorigin.com> To: "'mal@lemburg.com'" <mal@lemburg.com> Subject: PEP 262: Unicode Indexing Helper Module Date: Fri, 13 Jul 2001 13:26:52 +0100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Excuse me for commenting on an area which I know virtually nothing about, but one point struck me when I saw this PEP. You have methods for finding the start and end of various <indextypes>, but you don't have a method for finding the length of an <indextype>. In the case of words (which is the one I understand :-), the length of a word is not the same as the difference between the starts of consecutive words - the intervening whitespace should be excluded (at least for some applications). I would suggest length_<indextype>(u, index) -> integer Returns the length in Unicode objects of the <indextype> found at u[index] or -1 in case u[index] is not in an element of this type (for example, in the whitespace between words). [XXX Should this be the number of Unicode objects between index and the end of the element, or should it be the length from start to end even if you are in the middle?] or maybe better nextend_<indextype>(u, index) -> integer Returns the Unicode object index for the end of the next <indextype> found after u[index] or -1 in case no next element of this type exists. [But that runs into issues when you are in a word - If index is not the first Unicode object, nextend is the end of *this* element, whereas next is the start of the *next* element. I think I'm starting to show my ignorance...] Even though I suspect my suggested methods are too simplistic, I'd suggest at least a comment in the PEP on how to work out the length of the element you're in (or why it's hard, and you'd never want to do it :-)... Paul. --------------4273B7E264E4649CF795A2CF--
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4