Guido van Rossum <guido@python.org> writes: > > "Content-Type: > > application/x-www-form-urlencoded". Is utf-8 implied for the data > > once the url encoding has been reversed? > > I very much doubt it. You probably received that UTF-8 data from a > non-standard-conforming browser. That's partially a bug in HTTP forms, partially a bug in the browsers, and partially a bug in many CGI scripts. The original URL encoding of form paramters (in the URL itself, using GET) does not allow a specification of the encoding; that's the bug in HTTP. To work around this, *all* browsers (by silent convention) send form parameters in the encoding that the document was in. So if the document containing the form is in UTF-8, they will send the form parameters in UTF-8. Of course, unless you *know* what encoding the original document had, there is no way of telling that it is UTF-8. The RFC specifies that, if application/x-www-form-urlencoded is used, text fields *should* have a Content-Type field, with a charset argument. The bug in the browsers is that they omit the Content-Type declaration for individual fields. I've reported this bug for MSIE, Mozilla, and Opera. Some Mozilla author told me that they tried sending a charset= parameter, and that many Web sites broke when this is done - this is the bug in many CGI scripts. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4