[Tim] >> But Marc-Andre uses realloc at the end to return the excess. The >> excess bytes will get reused (and some returned yet again) by the >> next overallocation, and so on. [Martin] > Right. I confused this with the fact that PyMem_Realloc won't return > the excess memory, PyMem_Realloc does whatever the system realloc does -- PyMem_Realloc doesn't go thru pymalloc today (except in a PYMALLOC_DEBUG build). Doesn't matter, though, since strings use the PyObject_{Malloc, Free, Realloc} family today, and that does use pymalloc. OTOH, there's no reason PyObject_Realloc *has* to hang on to all small-block memory on a shrinking realloc, and there's no reason pymalloc couldn't grow another realloc entry point specifying what the caller wants a shrinking realloc to do. These things are all easy to change, but I don't know what's truly desirable. Note another subtlety: I expect you brought up PyMem_Realloc because unicodeobject.c uses the PyMem_XYZ family for managing the PyUnicodeObject.str member today. That means it normally never uses pymalloc at all, except to allocate fixed-size PyUnicodeObject structs (which use the PyObject_XYZ memory family). I don't know whether that's the best idea, but that's how it is today. pymalloc gets into this because PyUnicode_EncodeUTF8 returns a plain string object, and the latter uses pymalloc today. > so the extra bytes in a small string will be wasted for the life > time of the string object - that still could cause significant memory > wastage. It could. Python generally aims to optimize the expected case, not jump thru hoops to avoid worst cases (else we wouldn't use dicts at all <wink>). But I don't know what the expected case is here, and given how often I use Unicode in my own work it could be I'll never have a clue. Note that the expected uses of Unicode strings makes no difference to PyUnicode_EncodeUTF8: what counts there is the expected lifetimes and sizes of the "plain" utf8-encoded PyStringObjects it computes. Indeed, pymalloc has almost no implications for Unicode beyond the encode-as-a-plain-string functions (unless unicodeobject.c is changed to manage the PyUnicodeObject.str member using pymalloc too, as plain strings do today). >> MAL, you should keep in mind that pymalloc is also managing the >> small chunks in your scheme: when you're fiddling with a 40-character >> Unicode string, an overallocation "by a factor of 4" only amounts to >> an 80-character UTF8 string. > [I guess this is a terminology, not a math problem: Nope! Turns out it was an hallucination problem <wink>. > a 40 character Unicode string has already 80 bytes; the UTF-8 of > it can have up to 160 bytes]. You're right, of course. The conclusion doesn't change, though: that's still in the range of block pymalloc handles (and will remain so unless I reduce pymalloc's small-object threshold below what's needed for pymalloc to handle small dicts on its own -- which I'm unlikely to do).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4