RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2001-June/015374.html below:

[Python-Dev] Adding .decode() method to Unicode

[Python-Dev] Adding .decode() method to UnicodeMartin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 12 Jun 2001 13:00:40 +0200

Previous message: [Python-Dev] Adding .decode() method to Unicode
Next message: [Python-Dev] Adding .decode() method to Unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> > > str.encode()
> > > str.decode()
> > > uni.encode()
> > > #uni.decode() # still missing
> > 
> > It's not missing. str.decode and uni.encode go through a single codec;
> > that's easy. str.encode is somewhat more confusing, because it really
> > is unicode(str).encode. Now, you are not proposing that uni.decode is
> > str(uni).decode, are you?
> 
> No. uni.decode() will (just like the other methods) directly
> interface to the codecs decoder -- there is no magic conversion
> involved. It is meant to be used by Unicode-Unicode codecs

When invoking "Hallo".encode("utf-8"), two conversions are executed:
first the default decoding into Unicode, then the UTF-8 encoding. Of
course, that is not the intended use (but then, is the intended use
documented anywhere?): instead, people should write
"Hallo".encode("base64") instead. This is an example I can understand,
although I'm not sure why it is inherently better to write this
instead of writing base64.encodestring("Hallo").

> > If not that, what else would it mean? And if it means something else,
> > it is clearly not symmetric to str.encode, so it is not "missing".
> 
> It is in the sense that strings support this method and Unicode
> currently doesn't.

The rationale for string.encode is weak: it argues that string->string
conversions are frequent enough to justify this API, even though these
conversions have nothing to do with coded character sets.

So far, I can see *no* rationale for unicode.decode.

> There's no need for a PEP. This addition is much too simple
> to require a PEP on its own.

PEP 1 says:

# We intend PEPs to be the primary mechanisms for proposing new
# features, for collecting community input on an issue, and for
# documenting the design decisions that have gone into Python.  The
# PEP author is responsible for building consensus within the
# community and documenting dissenting opinions.

So we have a proposal for a new feature, and we have dissenting
opinions. Who are you to decide that this additions is too simple to
require a PEP on its own?

> As for use cases: I have already given a whole bunch of them
> (Unicode compression, normalization, escaping in various ways).

I was asking for specific examples: Names of specific codecs that you
want to implement, and application code fragments using these specific
codecs. I don't know how to use Unicode compression if I had such this
proposed feature, for example. I know what XML escaping is, and I
cannot see how this feature would help.

> True, but not all XML text out there is meant for XML parsers to 
> read ;-). Preprocessing of e.g. XML text in Python is a rather common
> thing to do and this is what the direct codec access methods are
> meant for.

Can you give an example of an application which processes XML without
a parser, but with converting character entities (preferably
open-source, so I can study its code)? I wonder whether they get CDATA
sections right... MAL, I really mean that: Please don't make claims
that something is common or useful without giving an *exact* example.

Regards,
Martin

P.S. This insistence on adding Unicode and string methods makes it
appear as if the author of the codecs module now thinks that the API
of it sucks.

Previous message: [Python-Dev] Adding .decode() method to Unicode
Next message: [Python-Dev] Adding .decode() method to Unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4