On Thu, Jun 5, 2014 at 6:50 AM, Glenn Linderman <v+python at g.nevcal.com> wrote: > 8) (Content specific variable size caches) Index each codepoint that is a > different byte size than the previous codepoint, allowing indexing to be > used in the intervals. Worst case size is like 2, best case size is a single > entry for the end, when all code points are represented by the same number > of bytes. Conceptually interesting, and I'd love to know how well that'd perform in real-world usage. Would do very nicely on blocks of text that are all from the same range of codepoints, but if you intersperse high and low codepoints it'll be like 2 but with significantly more complicated lookups (imagine a "name=value\nname=value\n" stream where the names and values are all in the same language - you'll have a lot of transitions). Chrisa
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4