On Wed, Sep 26, 2012 at 6:02 PM, Walter Dörwald <walter at livinglogic.de> wrote: > On 26.09.12 16:43, ezio.melotti wrote: > >> http://hg.python.org/cpython/rev/36f61661f71e >> changeset: 79194:36f61661f71e >> user: Ezio Melotti <ezio.melotti at gmail.com> >> date: Wed Sep 26 17:43:23 2012 +0300 >> summary: >> Add a few entries to whatsnew/3.3.rst. >> [...] >> >> + >> +A new :data:`~html.entities.html5` dictionary that maps HTML5 named character >> +references to the equivalent Unicode character(s) (e.g. ``html5['gt;'] == '>'``) >> +has been added to the :mod:`html.entities` module. The dictionary is now also >> +used by :class:`~html.parser.HTMLParser`. > > > Is there a reason why the trailing ';' is included in the entity names? > Yes, to quote <http://bugs.python.org/issue11113#msg163706>: """ The problem is that the standard allows some charref to end without a ';', but not all of them. So both "Éric" and Éric" will be parsed as "Éric", but only "αcentauri" will result in "αcentauri" -- "&alphacentauri" will be returned unchanged. """ To preserve this I included them both, in the same way they are listed at <http://www.w3.org/TR/html5/named-character-references.html>. This is also explained at <http://docs.python.org/dev/library/html.entities.html#html.entities.html5>. Best Regards, Ezio Melotti > Servus, > Walter
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4