Greg Ward wrote: > Well, my ignorance of Unicode has finally bitten me -- someone filed a > bug (#622831) against textwrap.py because it crashes when it attempts to > wrap a Unicode string. > > Here are the problems that I am aware of: > > * textwrap assumes "whitespace" means "the characters in > string.whitespace" It should use u.isspace() for this. You might also want to consider u.splitlines() for line breaking, since Unicode has a lot more line breaking characters than ASCII (which u.splitlines() knows about). > * textwrap assumes "lowercase letter" means "the characters in > string.lowercase" (heck, this only works in English) u.lower() will do the right thing for Unicode. > Can someone tell me what the proper way to do this is? Or just point me > at the relevant documentation? I've scoured the online docs and *Python > Essential Reference*, and I know more about the codes and unicodedata > modules than I did before. But I still don't know how to replace all > whitespace with space, or detect words that end with a lowercase letter. Hope that helps, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4