RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://mail.python.org/pipermail/python-dev/2000-September/009666.html below:

[Python-Dev] codecs question

[Python-Dev] codecs questionM.-A. Lemburg mal@lemburg.com
Sat, 30 Sep 2000 12:21:43 +0200

Previous message: [Python-Dev] codecs question
Next message: [Python-Dev] Changes in semantics to str()?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Martin von Loewis wrote:
> 
> > the "unicodenames" patch (which replaces ucnhash) includes this
> > functionality -- but with a little distance, I think it's better to add
> > it to the unicodedata module.
> >
> > (it's included in the step 4 patch, soon to be posted to a patch
> > manager near you...)
> 
> Sounds good. Is there any chance to use this in codecs, then?

If you need speed, you'd have to write a C codec for this
and yes: the ucnhash module does import a C API using a
PyCObject which you can use to access the static C data
table.

Don't know if Fredrik's version will also support this.

I think a C function as access method would be more generic
than the current direct C table access.

> I'm thinking of
> 
> >>> print u"\N{COPYRIGHT SIGN}".encode("ascii-ucn")
> \N{COPYRIGHT SIGN}
> >>> print u"\N{COPYRIGHT SIGN}".encode("latin-1-ucn")
> ©
> 
> Regards,
> Martin
> 
> P.S. Some people will recognize this as the disguised question 'how
> can I convert non-convertable characters using the XML entity
> notation?'

If you just need a single encoding, e.g. Latin-1, simply clone
the codec (it's coded in unicodeobject.c) and add the XML entity
processing.

Unfortunately, reusing the existing codecs is not too
efficient: the reason is that there is no error handling
which would permit you to say "encode as far as you can
and then return the encoded data plus a position marker
in the input stream/data".

Perhaps we should add a new standard error handling
scheme "break" which simply stops encoding/decoding
whenever an error occurrs ?!

This should then allow reusing existing codecs by
processing the input in slices.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/

Previous message: [Python-Dev] codecs question
Next message: [Python-Dev] Changes in semantics to str()?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4