A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2002-August/027318.html below:

[Python-Dev] Re: [ python-Patches-590682 ] New codecs: html, asciihtml

[Python-Dev] Re: [ python-Patches-590682 ] New codecs: html, asciihtmlMartin v. Loewis martin@v.loewis.de
04 Aug 2002 21:30:06 +0200
Oren Tirosh <oren-py-d@hishome.net> writes:

> It overloads the meaning of the error handling argument in an
> unintuitive way.  It gets to the point where it's much more than
> just error handling - it's actually extending the functionality of
> the codec.

Isn't that precisely the meaning fo "to handle"?

3 : to act on or perform a required function with regard to 
   <handle the day's mail>

It produces a replacement text, just in the same way as "ignore" or
"replace" produce replacement texts.

> Why implement yet another name-based registry?  

Namespaces are one honking great idea -- let's do more of those!

> There must be a simpler way to do it.

Propose one.

> What are the use cases?  Maybe a simple extension to charmap would
> be enough for all the practical cases?

The primary use case is XML: how do you efficiently use xml charrefs.
Notice that you can *not* use the charmap codec, since the underlying
encoding may not be based on the charmap codec.

In addition, it allows to give a more detailed analysis of an encoding
error, as it exposes the string position where the error occurs. This
allows to determine a "best" encoding (i.e. one that needs the fewest
amounts of exceptions, or the one that has the longest sequences of
same encodings).

> Me too.  But if you really don't want it to be rejected you should
> try to find a way to make it simpler.

Can you please elaborate why you think this is difficult? Is this a
concern about 
- the implementation of the PEP, or
- the implementation of error handlers, or
- the usage of error handlers?

I couldn't really believe that you find usage of this feature
difficult: just pass an error handling string to your codec just as
you currently do.

> 
> > While you are waiting for PEP 293 to complete, please do
> > consider cleaning up htmlentitydefs to provide mappings from
> > and to Unicode characters.
> 
> No problem.  The question is whether anyone depends on its current form.  
> My proposed changes:
> 
> 1. Use all lowercase entity names as keys.

That is probably a bad idea. Atleast for XHTML, the case of entity
references is normative. Even for HTML 4, it would be good if this
precisely matches the DTD.

You could provide a case-insensitive lookup function in addition.

> 2. Map "entityname" to u"\uXXXX" (currently it's mapped to "&#nnnn;")

I think htmlentitydefs.entitydefs must stay as-is, for
compatibility. Instead, I'd suggest to add additional
objects/functions. Of course, the data should be present only once -
all other functions/dictionaries could be derived.

> In its current form I find htmlentitydefs.py pretty useless. Names in the
> input in arbitrary case will not match the MixedCase keys in the entitydefs 
> dictionary and the decimal character reference isn't really more useful than 
> the named entity reference. 

Indeed. However, people probably rely on its specific contents, so any
more useful access to the data must preserve entitydefs in its current
form.

Regards,
Martin



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4