> I would propose to only add some very basic encodings to > the standard distribution, e.g. the ones mentioned under > Standard Codecs in the proposal: > > 'utf-8': 8-bit variable length encoding > 'utf-16': 16-bit variable length encoding (litte/big endian) > 'utf-16-le': utf-16 but explicitly little endian > 'utf-16-be': utf-16 but explicitly big endian > 'ascii': 7-bit ASCII codepage > 'latin-1': Latin-1 codepage > 'html-entities': Latin-1 + HTML entities; > see htmlentitydefs.py from the standard Pythin Lib > 'jis' (a popular version XXX): > Japanese character encoding > 'unicode-escape': See Unicode Constructors for a definition > 'native': Dump of the Internal Format used by Python since this is already very close, maybe we could adopt the naming guidelines from XML: In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used for the various encodings and transformations of Unicode/ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-9" should be used for the parts of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should be used for the various encoded forms of JIS X-0208-1997. XML processors may recognize other encodings; it is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA], other than those just listed, should be referred to using their registered names. Note that these registered names are defined to be case-insensitive, so processors wishing to match against them should do so in a case-insensitive way. (ie "iso-8859-1" instead of "latin-1", etc -- at least as aliases...). </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4