A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2010-June/100790.html below:

[Python-Dev] bytes

[Python-Dev] bytes / unicodeTerry Reedy tjreedy at udel.edu
Mon Jun 21 19:27:30 CEST 2010
On 6/20/2010 11:56 PM, Terry Reedy wrote:

> The specific example is
>
>  >>> urllib.parse.parse_qsl('a=b%e0')
> [('a', 'b�')]
>
> where the character after 'b' is white ? in dark diamond, indicating an
> error.
>
> parse_qsl() splits that input on '=' and sends each piece to
> urllib.parse.unquote
> unquote() attempts to "Replace %xx escapes by their single-character
> equivalent.". unquote has an encoding parameter that defaults to 'utf-8'
> in *its* call to .decode. parse_qsl does not have an encoding parameter.
> If it did, and it passed that to unquote, then
> the above example would become (simulated interaction)
>
>  >>> urllib.parse.parse_qsl('a=b%e0', encoding='latin-1')
> [('a', 'bà')]
>
> I got that output by copying the file and adding "encoding-'latin-1'" to
> the unquote call.
>
> Does this solve this problem?
> Has anything like this been added for 3.2?
> Should it be?

With a little searching, I found
http://bugs.python.org/issue5468
with Miles Kaufmann's year-old comment "parse_qs and parse_qsl should 
also grow encoding and errors parameters to pass to the underlying 
unquote()". Patch review is needed.

Terry Jan Reedy


More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4