A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2013-July/127370.html below:

[Python-Dev] Misc re.match() complaint

[Python-Dev] Misc re.match() complaintNick Coghlan ncoghlan at gmail.com
Tue Jul 16 13:03:51 CEST 2013
On 16 July 2013 19:18, Terry Reedy <tjreedy at udel.edu> wrote:
> I wonder if the change was an artifact of changing the code to prohibit
> mixing Unicode and bytes.

I'm pretty sure we the only thing we changed in 3.x is to migrate re
to the PEP 3118 buffer API, and the behavioural change Guido is seeing
is actually the one between the 2.x buffer (which returns 8-bit
strings when sliced) and other types (including memoryview) which
return instances of themselves.

Getting the old buffer behaviour in 3.x without an extra copy
operation should just be a matter of wrapping the input with
memoryview (to avoid copying the group elements in the match object)
and the output with bytes (to avoid keeping the entire original object
alive just to reference a few small pieces of it that were matched by
the regex):

>>> import re
>>> data = bytearray(b"aaabbbcccddd")
>>> re.match(b"(a*)b*c*(d*)", data).group(2)
bytearray(b'ddd')
>>> bytes(re.match(b"(a*)b*c*(d*)", memoryview(data)).group(2))
b'ddd'

Given that, I'm inclined to keep the existing behaviour on backwards
compatibility grounds. To make the above code work on both 2.x *and*
3.x without making an extra copy, it's possible to keep the bytes call
(it should be a no-op on 2.x) and dynamically switch the type used to
wrap the input between buffer in 2.x and memoryview in 3.x
(unfortunately, the 2.x memoryview doesn't work for this case, as the
2.x re API doesn't accept it as valid input).

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4