> Maybe I didn't understand the RFC quite right, but it seemed like how to > handle hostnames was left as a choice between IDNA encoding the hostname > or replacing the non-ascii characters with dashes? I guess in practice > IDNA is the right decision. I haven't fully understood it, either, but I think that's the right conclusion. People want to fetch the resource, then, and encoding the host name in UTF-8 won't do much good. > Seems like the other somewhat under-specified part of all of this is how > urllib.unquote() should work. If after percent decoding it sees > non-ascii octets, should it try to decode them as utf-8 and if that > fails then leave them as is? That's why I think that using IRIs should be a separate feature, perhaps a separate module entirely. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4