On 6/20/2010 11:56 PM, Terry Reedy wrote: > The specific example is > > >>> urllib.parse.parse_qsl('a=b%e0') > [('a', 'b�')] > > where the character after 'b' is white ? in dark diamond, indicating an > error. > > parse_qsl() splits that input on '=' and sends each piece to > urllib.parse.unquote > unquote() attempts to "Replace %xx escapes by their single-character > equivalent.". unquote has an encoding parameter that defaults to 'utf-8' > in *its* call to .decode. parse_qsl does not have an encoding parameter. > If it did, and it passed that to unquote, then > the above example would become (simulated interaction) > > >>> urllib.parse.parse_qsl('a=b%e0', encoding='latin-1') > [('a', 'bà')] > > I got that output by copying the file and adding "encoding-'latin-1'" to > the unquote call. > > Does this solve this problem? > Has anything like this been added for 3.2? > Should it be? With a little searching, I found http://bugs.python.org/issue5468 with Miles Kaufmann's year-old comment "parse_qs and parse_qsl should also grow encoding and errors parameters to pass to the underlying unquote()". Patch review is needed. Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4