Please comment... -- PEP: 0263 (?) Title: Defining Unicode Literal Encodings Version: $Revision: 1.0 $ Author: mal@lemburg.com (Marc-Andr=E9 Lemburg) Status: Draft Type: Standards Track Python-Version: 2.3 Created: 06-Jun-2001 Post-History:=20 Abstract This PEP proposes to use the PEP 244 statement "directive" to make the encoding used in Unicode string literals u"..." (and their raw counterparts ur"...") definable on a per source file basis. Problem In Python 2.1, Unicode literals can only be written using the Latin-1 based encoding "unicode-escape". This makes the programming environment rather unfriendly to Python users who live and work in non-Latin-1 locales such as many of the eastern countries. Programmers can write their 8-bit strings using the favourite encoding, but are bound to the "unicode-escape" encoding for Unicode literals. Proposed Solution I propose to make the Unicode literal encodings (both standard and raw) a per-source file option which can be set using the "directive" statement proposed in PEP 244. Syntax The syntax for the directives is as follows: 'directive' WS+ 'unicodeencoding' WS* '=3D' WS* PYTHONSTRINGLITERAL 'directive' WS+ 'rawunicodeencoding' WS* '=3D' WS* PYTHONSTRINGLITERA= L with the PYTHONSTRINGLITERAL representing the encoding name to be used as standard Python 8-bit string literal and WS being the whitespace characters [ \t]. Semantics Whenever the Python compiler sees such an encoding directive during the compiling process, it updates an internal flag which holds the encoding name used for the specific literal form. The encoding name flags are initialized to "unicode-escape" for u"..."=20 literals and "raw-unicode-escape" for ur"..." respectively. ISSUE: Maybe we should restrict the directive usage to once per file and additionally to a placement before the first Unicode literal=20 in the source file. If the Python compiler has to convert a Unicode literal to a Unicode object, it will pass the 8-bit string data given by the literal to the Python codec registry and have it decode the data using the current setting of the encoding name flag for the requested type of Unicode literal. It then checks the result of the decoding operation for being an Unicode object and stores it in the byte code stream. Scope This PEP only affects Python source code which makes use of the proposed directives. It does not affect the coercion handling of 8-bit strings and Unicode in the given module. Copyright This document has been placed in the public domain. =0C Local Variables: mode: indented-text indent-tabs-mode: nil End: --=20 Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4