04.06.14 18:38, Paul Sokolovsky написав(ла): >> Any non-trivial text parsing uses indices or regular expressions (and >> regular expressions themself use indices internally). > > I keep hearing this stuff, and unfortunately so far don't have enough > time to collect all that stuff and provide detailed response. So, > here's spur of the moment response - hopefully we're in the same > context so it is easy to understand. > > So, gentlemen, you keep mixing up character-by-character random access > to string and taking substrings of a string. > > Character-by-character random access imply that you would need to scan > thru (possibly almost) all chars in a string. That's O(N) (N-length of > string). With varlength encoding (taking O(N) to index arbitrary char), > there's thus concern that this would be O(N^2) op. > > But show me real-world case for that. Common usecase is scanning string > left-to-right, that should be done using iterator and thus O(N). > Right-to-left scanning would be order(s) of magnitude less frequent, as > and also handled by iterator. html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting from current position using str.find or re.search, process found token, advance position and repeat.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4