[M.-A. Lemburg] > ... > Currently, mapping tables map characters to Unicode characters > and vice-versa. Now the .translate method will use a different > kind of table: mapping integer ordinals to integer ordinals. You mean that if I want to map u"a" to u"A", I have to set up some sort of dict mapping ord(u"a") to ord(u"A")? I simply couldn't follow this. > Question: What is more of efficient: having lots of integers > in a dictionary or lots of characters ? My bet is "lots of integers", to reduce both space use and comparison time. > ... > Something else that changed is the way .capitalize() works. The > Unicode version uses the Unicode algorithm for it (see TechRep. 13 > on the www.unicode.org site). #13 is "Unicode Newline Guidelines". I assume you meant #21 ("Case Mappings"). > Here's the new doc string: > > S.capitalize() -> unicode > > Return a capitalized version of S, i.e. words start with title case > characters, all remaining cased characters have lower case. > > Note that *all* characters are touched, not just the first one. > The change was needed to get it in sync with the .iscapitalized() > method which is based on the Unicode algorithm too. > > Should this change be propogated to the string implementation ? Unicode makes distinctions among "upper case", "lower case" and "title case", and you're trying to get away with a single "capitalize" function. Java has separate toLowerCase, toUpperCase and toTitleCase methods, and that's the way to do it. Whatever you do, leave .capitalize alone for 8-bit strings -- there's no reason to break code that currently works. "capitalize" seems a terrible choice of name for a titlecase method anyway, because of its baggage connotations from 8-bit strings. Since this stuff is complicated, I say it would be much better to use the same names for these things as the Unicode and Java folk do: there's excellent documentation elsewhere for all this stuff, and it's Bad to make users mentally translate unique Python terminology to make sense of the official docs. So my vote is: leave capitalize the hell alone <wink>. Do not implement capitialize for Unicode strings. Introduce a new titlecase method for Unicode strings. Add a new titlecase method to 8-bit strings too. Unicode strings should also have methods to get at uppercase and lowercase (as Unicode defines those).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4