Martin von Loewis wrote: > > > the "unicodenames" patch (which replaces ucnhash) includes this > > functionality -- but with a little distance, I think it's better to add > > it to the unicodedata module. > > > > (it's included in the step 4 patch, soon to be posted to a patch > > manager near you...) > > Sounds good. Is there any chance to use this in codecs, then? If you need speed, you'd have to write a C codec for this and yes: the ucnhash module does import a C API using a PyCObject which you can use to access the static C data table. Don't know if Fredrik's version will also support this. I think a C function as access method would be more generic than the current direct C table access. > I'm thinking of > > >>> print u"\N{COPYRIGHT SIGN}".encode("ascii-ucn") > \N{COPYRIGHT SIGN} > >>> print u"\N{COPYRIGHT SIGN}".encode("latin-1-ucn") > © > > Regards, > Martin > > P.S. Some people will recognize this as the disguised question 'how > can I convert non-convertable characters using the XML entity > notation?' If you just need a single encoding, e.g. Latin-1, simply clone the codec (it's coded in unicodeobject.c) and add the XML entity processing. Unfortunately, reusing the existing codecs is not too efficient: the reason is that there is no error handling which would permit you to say "encode as far as you can and then return the encoded data plus a position marker in the input stream/data". Perhaps we should add a new standard error handling scheme "break" which simply stops encoding/decoding whenever an error occurrs ?! This should then allow reusing existing codecs by processing the input in slices. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4