>>>>> "AM" == aahz <aahz@panix.com> writes: >> [Aahz] >>> Someone just pointed out on c.l.py that we need an HTMLescape() >>> function that takes a string and converts special characters to >>> entities. I'm not on python-dev, so could you please forward >>> this and find out whether I need to run a PEP? >> >> Has someone pointed out yet that this is done by cgi.escape()? AM> Yeah, I missed that earlier. But after thinking some more, AM> there are a fair number of browser-like bits of software that AM> fail to render many of the special characters correctly AM> (e.g. trademark). This is frequently due to character set AM> issues; entities almost always render correctly, though. AM> Therefore a general translation routine is probably handy. AM> cgi.escape() only handles "&", "<", ">". I'm not sure whether AM> cgi.escape ought to be expanded to handle all characters or a AM> new routine should be added. Martin van Loewis suggested AM> xml.sax.saxutils.escape(), but I have zero familiarity with XML AM> and am waiting for 2.0final. Perhaps this should be taken AM> off-line? Perhaps we should take it to python-list -- or maybe we should form a web-sig and work on it there. There are definitely some tricky issues to work out. I attempted to work out some of the same issues for internationalization support in Mailman's pipermail archives. The escape function in cgi should stay minimal, because it deals with the only truly essential characters. If the browser interprets an HTML page as iso-8859-1 ("Latin 1") then characters > chr(127) are going to be rendered properly. You can add an explicit meta tag to the HTML page and the server will return the charset in the headers. This seems quite a bit simpler than trying to escape all characters > chr(127), except if you have to deal with old browsers that don't support the charset specified by the HTTP header. Jeremy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4