M.-A. Lemburg wrote: > > Greg Stein wrote: > > > > On Tue, 10 Aug 1999, Fredrik Lundh wrote: > > > maybe the unicode class shouldn't implement the > > > buffer interface at all? sure looks like the best way > > > > It is needed for fp.write(unicodeobj) ... > > > > It is also very handy for C functions to deal with Unicode strings. > > Wouldn't a special C API be (even) more convenient ? Why? Accessing the Unicode values as a series of bytes matches exactly to the semantics of the buffer interface. Why throw in Yet Another Function? Your abstract.c functions make it quite simple. > > > to avoid trivial mistakes (the current behaviour of > > > fp.write(unicodeobj) is even more serious than the > > > marshal glitch...) > > > > What's wrong with fp.write(unicodeobj)? It should write the unicode value > > to the file. Are you suggesting that it will need to be done differently? > > Icky. > > Would this also write some kind of Unicode encoding header ? > [Sorry, this is my Unicode ignorance shining through... I only > remember lots of talk about these things on the string-sig.] Absolutely not. Placing the Byte Order Mark (BOM) into an output stream is an application-level task. It should never by done by any subsystem. There are no other "encoding headers" that would go into the output stream. The output would simply be UTF-16 (2-byte values in host byte order). > Since fp.write() uses "s#" this would use the getreadbuffer > slot in 1.5.2... I think what it *should* do is use the > getcharbuffer slot instead (see my other post), since dumping > the raw unicode data would loose too much information. Again, I very much disagree. To me, fp.write() is not about writing characters to a stream. I think it makes much more sense as "writing bytes to a stream" and the buffer interface fits that perfectly. There is no loss of data. You could argue that the byte order is lost, but I think that is incorrect. The application defines the semantics: the file might be defined as using host-order, or the application may be writing a BOM at the head of the file. > such things should be handled by extra methods, e.g. fp.rawwrite(). I believe this would be a needless complication of the interface. > Hmm, I guess the philosophy behind the interface is not > really clear. I didn't design or implement it initially, but (as you may have guessed) I am a proponent of its existence. > Binary data is fetched via getreadbuffer and then > interpreted as character data... I always thought that the > getcharbuffer should be used for such an interpretation. The former is bad behavior. That is why getcharbuffer was added (by me, for 1.5.2). It was a preventative measure for the introduction of Unicode strings. Using getreadbuffer for characters would break badly given a Unicode string. Therefore, "clients" that want (8-bit) characters from an object supporting the buffer interface should use getcharbuffer. The Unicode object doesn't implement it, implying that it cannot provide 8-bit characters. You can get the raw bytes thru getreadbuffer. > Or maybe, we should dump the getcharbufer slot again and > use the getreadbuffer information just as we would a > void* pointer in C: with no explicit or implicit type information. Nope. That path is frought with failure :-) Cheers, -g -- Greg Stein, http://www.lyra.org/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4