[Guido] > Would it make sense to change the Unicode object to use pymalloc, and > to change the UTF-8 codec to count the bytes if the shortest possible > output would fit in a pymalloc block? These are independent questions, and I don't know how to answer either unless you give me a test program that prints the value of the function you're trying to minimize <0.7 wink>. The Unicode object currenly uses quite an elaborate free list, caching both PyUnicodeObject structs (which currently use pymalloc), and their str members (which currently do not). Whether the str member uses pymalloc really doesn't have anything to do with what the UTF8 encoder function does (it returns plain strings, and those already use pymalloc today -- and it's not entirely clear whether they should either!). Counting the bytes in the UTF8 decoder could work well, independent of that: if the result is known to fit in a pymalloc block, just do it; as soon as it's known that it won't, overallocate with assurance that the system realloc will give back everything that isn't used. In the latter case I believe the code could be made much simpler, by doing a factor-of-4 overallocation from the start (it currently tries 2, then 3, then 4, with a bunch of embedded-in-the-loops tests to prevent overwrites; I'm not sure why it bothers with this staggered scheme, since it's going to touch exactly as much memory as it actually needs regardless, and give all the rest back untouched). > (I guess this means that the length of the Unicode string should be > less than SMALL_REQUEST_THRESHOLD - currently 256.) For a start, yes. I'd stick a "Py_" in front of that symbol and expose it then. The cutoff test would also have to take into account the size of the result's PyStringObject header (the whole stringobject enchilada counts against the threshold).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4