>From the Library Reference (2.2.1): \b Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric characters, so the end of a word is indicated by whitespace or a non-alphanumeric character. Inside a character range, \b represents the backspace character, for compatibility with Python's string literals. Now reality: Python 2.2.1 (#2, Apr 22 2002, 17:53:10) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> t = re.compile(r'\bbag\b') >>> t.search('test bag') <_sre.SRE_Match object at 0x812aad0> >>> t.search('test+bag') <_sre.SRE_Match object at 0x815d528> >>> t.search('test_bag') >>> [ chr(i) for i in xrange(256) if not t.search('test' + chr(i) + 'bag') ] ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'] >>> So the implementation appears to define a word as a sequence of alphanumeric characters or underscores, which means either the documentation, or the library is wrong. Now it happens that this was found while a friend of mine and I were looking to get the exact behavior that is implemented, so I'd prefer it if the documentation were updated to meet the implementation <.8 wink>. -- Christopher A. Craig <list-python@ccraig.org> I develop for Linux for a living, I used to develop for DOS. Going from DOS to Linux is like trading a glider for an F117. - Lawrence Foard
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4