> Yeah, I missed that earlier. But after thinking some more, there are a > fair number of browser-like bits of software that fail to render many of > the special characters correctly (e.g. trademark). This is frequently > due to character set issues; entities almost always render correctly, > though. Therefore a general translation routine is probably handy. But character set issues also make it impossible to provide such a translation routine: the current party line is that the encoding of 8-bit strings is unknown and that only ASCII can be assumed. > cgi.escape() only handles "&", "<", ">". I'm not sure whether cgi.escape > ought to be expanded to handle all characters or a new routine should be > added. Martin van Loewis suggested xml.sax.saxutils.escape(), but I > have zero familiarity with XML and am waiting for 2.0final. Perhaps > this should be taken off-line? xml.sax.saxutils.escape() is a generalization of cgi.escape() -- read the source (Lib/xml/sax/saxutils.py). It allows you to specify additional things to be replaced by entities by passing in a dictionary mapping chars (or strings) to entities. If you know that you are dealing with Latin-1, you could use the table in htmlentitydefs.py to construct a table. --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4