> This reminds me that I often miss, in the standard `ctype.h' and related, > a function that would un-combine a character into its base character and > its diacritic, and the complementary re-combining function. > > Even if this might be easier for Latin-1, it is difficult to design > something general enough. Characters may have a more complex structure > than a mere base and single diacritic. I do not know what to suggest. I bet the Unicode standard has a standard way to do this. Maybe we can implement that, and then project the same interface on 8-bit characters? Of course character encoding issues might get in the way if <ctype.h> doesn't provide the data -- so you may be better off doing this in Unicode only. (We must never assume that 8-bit strings contain Latin-1.) --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4