Fredrik Lundh wrote: > which reminds me: the HTTP protocol says that a charset specified > at the HTTP protocol level should override any encoding specified in > the document itself. I believe HTTP (RFC 2616) rather meekly asserts that the HTTP Content-Type header *always* defines the encoding of the body. If no charset is specified, the body is ISO-8859-1. I believe this requirement is ignored in practice. HTTP servers don't correctly label outgoing documents, and HTTP clients ignore whatever the HTTP server says. Browsers usually search HTML documents for <meta> and XML documents for <?xml encoding=?>, and I think they always prefer a document's internal mark to what the HTTP headers say. (Anyone know for sure?) Just another charset headache. ## Jason Orendorff http://www.jorendorff.com/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4