Fredrik Lundh wrote: > > when hacking on SRE's substitution code, I stumbled > upon a problem. to do a substitution, SRE needs to > merge slices from the target strings and from the sub- > stitution pattern. > > here's a simple example: > > re.sub( > "(perl|tcl|java)", > "python (not \\1)", > "perl rules" > ) > > contains a "substitution pattern" consisting of three > parts: > > "python (not " (a slice from the substitution string) > group 1 (a slice from the target string) > ")" (a slice from the substitution string) > > PCRE implements this by doing the slicing (thus creating > three new strings), and then doing a "join" by hand into > a PyString buffer. > > this isn't very efficient, and it also doesn't work for uni- > code strings. Why not ? The Unicode implementation has an API PyUnicode_Join() which does eaxctly this: extern DL_IMPORT(PyObject*) PyUnicode_Join( PyObject *separator, /* Separator string */ PyObject *seq /* Sequence object */ ); Note that the PyUnicode_Join() API takes a sequence of Unicode objects, strings or objects providing the charbuf interface, coerces all of these into a Unicode object and then does the joining. There is also a _PyUnicode_Resize() API. It is currently not exported though... but that's easy to fix. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4