Oren Tirosh wrote: > In its current form I find htmlentitydefs.py pretty useless. I use it a lot, and find it reasonably useful. sure beats typing in the HTML character tables myself, or writing a DTD parser. > Names in the input in arbitrary case will not match the MixedCase > keys in the entitydefs dictionary people who use oddball characters may prefer to keep uppercase letters separate from lowercase letters. if I type "Link=F6ping" using a named entity, I don't want it to come out as "Link=D6ping". if you don't care, nothing stops you from using the "lower" string method. > and the decimal character reference isn't really more useful than > the named entity reference. really? converting a decimal character reference to a unicode character is trivial, but how do you convert a named entity reference to a unicode character? (look it up in the htmlentitydefs?) here's a trivial piece of code that converts the entitydefs dictionary to a entity->unicode mapping: entitydefs_unicode =3D {} for entity, char in entitydefs.items(): if char[:2] =3D=3D "&#": char =3D unichr(int(char[2:-1])) else: char =3D unicode(char, "iso-8859-1") entitydefs_unicode[entity] =3D char </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4