Skip Montanaro wrote: > Hye-Shik> 1) regard all characters as non-wide. > Hye-Shik> 2) decode the string to unicode with the system default encoding > Hye-Shik> and call its methods. > > ... > > Hye-Shik> I didn't make my mind between these two yet. What do you think? > > #1 sounds like the most reasonable to me. That violates the rule In the face of ambiguity, refuse the temptation to guess. For a byte string, for "character width" to be a meaningful concept, the byte string must use a multi-byte encoding. The, .iswide would not be implementable because it doesn't apply to a single byte, but a sequence of bytes. .width is unimplementable because you need to know the encoding. So I propose that the methods aren't added to string objects. > You can't rely on strings coming > into your program with proper encoding information, and they might come from > an environment different than sys.defaultencoding (think WWW), so #2 seems > like it would create as many problems as it solves. All I'm interested in > is avoiding needless occurrences of these constructs in code: > > if isinstance(s, unicode): > width = s.width() > else: > ... If you have code that cares about character width, you need to convert all incoming strings to Unicode. Then, you can just write width = s.width() If you find you are writing code like the one above, you have found a bug in your code. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4