RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/1999-November/001341.html below:

UTF-8 in source code (Re: [Python-Dev] Internationalization Toolkit)

UTF-8 in source code (Re: [Python-Dev] Internationalization Toolkit)M.-A. Lemburg mal@lemburg.com
Wed, 17 Nov 1999 11:03:59 +0100

Previous message: [Python-Dev] Unicode proposal: %-formatting ?
Next message: UTF-8 in source code (Re: [Python-Dev] Internationalization Toolkit)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Tim Peters wrote:
> 
> [MAL]
> > ...demo script...
> 
> It looks like
> 
>     r'\\u0000'
> 
> will get translated into a 2-character Unicode string.

Right...

> That's probably not
> good, if for no other reason than that Java would not do this (it would
> create the obvious 7-character Unicode string), and having something that
> looks like a Java escape that doesn't *work* like the Java escape will be
> confusing as heck for JPython users.  Keeping track of even-vs-odd number of
> backslashes can't be done with a regexp search, but is easy if the code is
> simple <wink>:
> ...Tim's version of the demo...

Guido and I have decided to turn \uXXXX into a standard
escape sequence with no further magic applied. \uXXXX will
only be expanded in u"" strings.

Here's the new scheme:

With the 'unicode-escape' encoding being defined as:

· all non-escape characters represent themselves as a Unicode ordinal
  (e.g. 'a' -> U+0061).

· all existing defined Python escape sequences are interpreted as
  Unicode ordinals; note that \xXXXX can represent all Unicode
  ordinals, and \OOO (octal) can represent Unicode ordinals up to U+01FF.

· a new escape sequence, \uXXXX, represents U+XXXX; it is a syntax
  error to have fewer than 4 digits after \u.

Examples:

u'abc'          -> U+0061 U+0062 U+0063
u'\u1234'       -> U+1234
u'abc\u1234\n'  -> U+0061 U+0062 U+0063 U+1234 U+05c

Now how should we define ur"abc\u1234\n"  ... ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    44 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/

Previous message: [Python-Dev] Unicode proposal: %-formatting ?
Next message: UTF-8 in source code (Re: [Python-Dev] Internationalization Toolkit)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4