Fran=E7ois Pinard wrote: > This reminds me that I often miss, in the standard `ctype.h' and = related, > a function that would un-combine a character into its base character = and > its diacritic, and the complementary re-combining function. import unicodedata def uncombine(char): chars =3D unicodedata.decomposition(unichr(ord(char))).split() if not chars: return [char] return [unichr(int(x, 16)) for x in chars if x[0] !=3D "<"] for char in "Fran=E7ois": print uncombine(char) ['F'] ['r'] ['a'] ['n'] [u'c', u'\u0327'] ['o'] ['i'] ['s'] (to go the other way, store all uncombinations longer than one character in a dictionary) </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4