On 16/07/2013 00:30, Gregory P. Smith wrote: > > On Mon, Jul 15, 2013 at 4:14 PM, Guido van Rossum <guido at python.org > <mailto:guido at python.org>> wrote: > > In a discussion about mypy I discovered that the Python 3 version of > the re module's Match object behaves subtly different from the Python > 2 version when the target string (i.e. the haystack, not the needle) > is a buffer object. > > In Python 2, the type of the return value of group() is always either > a Unicode string or an 8-bit string, and the type is determined by > looking at the target string -- if the target is unicode, group() > returns a unicode string, otherwise, group() returns an 8-bit string. > In particular, if the target is a buffer object, group() returns an > 8-bit string. I think this is the appropriate behavior: otherwise > using regular expression matching to extract a small substring from a > large target string would unnecessarily keep the large target string > alive as long as the substring is alive. > > But in Python 3, the behavior of group() has changed so that its > return type always matches that of the target string. I think this is > bad -- apart from the lifetime concern, it means that if your target > happens to be a bytearray, the return value isn't even hashable! > > Does anyone remember whether this was a conscious decision? Is it too > late to fix? > > > Hmm, that is not what I'd expect either. I would never expect it to > return a bytearray; I'd normally assume that .group() returned a bytes > object if the input was binary data and a str object if the input was > unicode data (str) regardless of specific types containing the input > target data. > > I'm going to hazard a guess that not much, if anything, would be > depending on getting a bytearray out of that. Fix this in 3.4? 3.3 and > earlier users are stuck with an extra bytes() call and data copy in > these cases I guess. > I'm not sure I understand the complaint. I get this for Python 2.7: Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import array >>> import re >>> re.match(r"a", array.array("b", "a")).group() array('b', [97]) It's the same even in Python 2.4.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4