On 2008-08-06 18:55, Antoine Pitrou wrote: > Martin v. Löwis <martin <at> v.loewis.de> writes: >> URLs are just not made for non-ASCII characters. > > Perhaps they are not, but every non-English wiki (just to take a simple, generic > example) potentially contains non-ASCII URLs. > e.g. http://fr.wikipedia.org/wiki/%C3%89l%C3%A9phant > http://wiki.python.org/moin/J%C3%BCrgenHermann > (notice the utf-8 encoding in both) > >> Implement IRIs if you want non-ASCII characters; the rules are much clearer > for these. > > I think most people would expect something which works with the current World > Wide Web rather than a rigorous implementation of a specific RFC. Implementing > RFCs is fine but it does not magically eliminate all problems, especially when > the RFCs themselves are not in sync with real-world usage. +1. Practicality beats purity... The web is moving towards UTF-8 as standard Unicode encoding, so it's probably wise to follow that approach for quote(). http://en.wikipedia.org/wiki/Percent-encoding The other way around will also have to deal with old-style URLs which typically still use the Latin-1 encoding which was the basis for HTML: http://www.w3schools.com/TAGS/ref_urlencode.asp So unquote() should probably try to decode using UTF-8 first and then fall back to Latin-1 if that doesn't work. Whether the result of quote()/unquote() should be bytes or Unicode is a different story and probably also depends on what the application does with the result. I don't think there's a good general answer for that one, except maybe just going for one possible combination and document that. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 05 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4