A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-March/021541.html below:

[Python-Dev] PEP 263 considered faulty (for some Japanese

[Python-Dev] PEP 263 considered faulty (for some JapaneseMartin v. Loewis martin@v.loewis.de
21 Mar 2002 07:37:15 +0100
"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> Not similar enough.  There is a big difference between a spec which
> states
> 
>     Definition: A parsed[1] entity contains text, a sequence of
>     characters, which may represent markup or character data.
>     Definition: A character is an atomic unit of text as specified by
>     ISO/IEC 10646. Legal characters are tab, carriage return, line
>     feed, and the legal characters of Unicode and ISO/IEC 10646.
> 
> and PEP 263, which deliberately avoids any such declaration.  

If the PEP would say

  A Python source code file contains text, a sequence of characters,
  which may represent lines.  Definition: A character is an atomic
  unit of text as specified by ISO/IEC 10646. Legal characters are
  tab, carriage return, line feed, and the legal characters of Unicode
  and ISO/IEC 10646.

it would not change a bit, in my view. Why do you perceive a
difference?

> In fact, I described (without being familiar with the XML spec until
> you mentioned it) something very close to the XML specification, and
> an implementation which gives the same practical benefits as PEP 263.

You've described an implementation model named "hooks", which I always
assumed to be similar to Emacs hooks, and which I understood to
deliberately not deal at all with encodings - doing so would be the
user's task.

XML is completely different in this respect. The EncodingDecl,

http://www.w3.org/TR/REC-xml#NT-EncodingDecl

is part of the language, and the recommendation specifies that any
processor must understand the values "UTF-8", "UTF-16",
"ISO-10646-UCS-2", and "ISO-10646-UCS-4". It also has this statement

# It is a fatal error if an XML entity is determined (via default,
# encoding declaration, or higher-level protocol) to be in a certain
# encoding but contains octet sequences that are not legal in that
# encoding. It is also a fatal error if an XML entity contains no
# encoding declaration and its content is not legal UTF-8 or UTF-16.

I can find equivalents of all this in PEP 263. For example, it is a
fatal error (in phase 2) if a Python source file contains no encoding
declaration and its content is not legal ASCII.

> Footnotes: [1] I am not an XML expert, but as far as I can tell,
> "parsed entity" refers to the fact that before it may be used it
> must be parsed, not to some kind of transformation of the entity,
> which is then submitted to the XML processor.

"parsed" in the context of XML means that the entity has markup, and
thus follows the production extParsedEnt (for example). The production
rules always refer to characters, which are obtained from converting
the input file into Unicode, according to the declared encoding.

Regards,
Martin



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4