jack wrote: > Sigh. \u0308 is not in the range(256), but the whole point of=20 > encode('latin-1') is to make it so, isn't it? Define "make it so"? The encoders convert unicode code points to corresponding code points in the given 8-bit encoding. One character in, one character out (unless the target encoding is a multibyte encoding, like utf-8). This works perfectly well if producers follow the "early uniform normalization" rule (everything else is madness). For some reason, your listdir implementation doesn't. Instead of returning LATIN SMALL LETTER O WITH DIARESIS (\u00f6), it returns multiple unicode characters. I'd say it's broken. As far as I know, there's no standard unicode normalizer in Python. </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4