> Various notes: > * PyUnicode_READ() is slower than reading a Py_UNICODE array. > * Some decoders unroll the main loop to process 4 or 8 bytes (32 or > 64 bits CPU) at each step. > > I am interested if you know other tricks to optimize Unicode strings > in Python, or if you are interested to work on this topic. Beyond creation, the most frequent approach is to specialize loops for all three possible width, allowing the compiler to hard-code the element size. This brings it back in performance to the speed of accessing a Py_UNICODE array (or faster for 1-byte strings). A possible micro-optimization might be to use pointer arithmetic instead of indexing. However, I would expect that compilers will already convert a counting loop into pointer arithmetic if the index is only ever used for array access. A source of slow-down appears to be widening copy operations. I wonder whether microprocessors are able to do this faster than what the compiler generates out of a naive copying loop. Another potential area for further optimization is to better pass-through PyObject*. Some APIs still use char* or Py_UNICODE*, when the caller actually holds a PyObject*, and the callee ultimate recreates an object out of the pointers being passed. Some people (hi Larry) still think that using a rope representation for string concatenation might improve things, see #1569040. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4