Guido van Rossum: > > This wasn't usefully true in the past for DBCS strings and is > > not the right way to think of either narrow or wide strings > > now. The idea that strings are arrays of characters gets in > > the way of dealing with many encodings and is the primary > > difficulty in localising software for Japanese. > > Can you explain the kind of problems encountered in some more detail? Programmers used to working with character == indexable code unit will often split double wide characters when performing an action. For example searching for a particular double byte character "bc" may match "abcd" incorrectly where "ab" and "cd" are the characters. DBCS is not normally self synchronising although UTF-8 is. Another common problem is counting characters, for example when filling a line, hitting the line width and forcing half a character onto the next line. > I think it's a good idea to provide a set of higher-level tools as > well. However nobody seems to know what these higher-level tools > should do yet. PEP 261 is specifically focused on getting the > lower-level foundations right (i.e. the objects that represent arrays > of code units), so that the authors of higher level tools will have a > solid base. If you want to help author a PEP for such higher-level > tools, you're welcome! Its more likely I'll publish some of the low level pieces of Scintilla/SinkWorld as a Python extension providing some of these facilities in an editable-text class. Then we can see if anyone else finds the code worthwhile. Neil
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4