Is there any thought to extending escape to escape / unescape to by default handle characters other than <, >, and &? At a minimum it should handle arbitrary &xxx; values. Ideally, it would also handle common other symbolic names besides < > etc. HTML from common web sites such as nytimes.com frequently has a variety of characters escaped. Consider the page at http://travel.nytimes.com/travel/guides/europe/france/provence-and-the-french-riviera/overview.html It lists its content type as: content="text/html; charset=UTF-8" And contains text like: There’s the Côte d’ Ideally, we would decode ’ into ’ and ô into ô. Unfortunately, #146 is really an error -- it's not a utf-8 encoded unicode character but really a MS codepage 1252 character for apostrophe (apparently may HTML editing systems intermingle unicode and codepage 1252 content for apostrophes and a few other common characters). I'm happy to contribute some additional code for these other cases if people agree it's useful. On May 12, 2008, at 10:36 AM, Tony Nelson wrote: > At 11:56 PM -0400 5/10/08, Fred Drake wrote: >> On May 10, 2008, at 11:49 PM, Guido van Rossum wrote: >>> Works for me. The other thing I always use from cgi is escape() -- >>> will that be available somewhere else too? >> >> >> xml.sax.saxutils.escape() would be an appropriate replacement, though >> the location is a little funky. > > At least it's right next to the valuable quoteattr(). > -- > ____________________________________________________________________ > TonyN.:' <mailto:tonynelson at georgeanelson.com> > ' <http://www.georgeanelson.com/> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/thomaspinckney3%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20080512/3c819faa/attachment.htm>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4