> The smtplib problem may be easily explained -- AFAIK, the SMTP > protocol doesn't support Unicode, and the module isn't > Unicode-aware, so it is probably writing garbage to the socket. I've investigated this somewhat, and noticed the cause of the problem. The send method of the socket passes the raw memory representation of the Unicode object to send(2). On i386, this comes out as UTF-16LE. It appears that this behaviour is not documented anywhere (where is the original specification of the Unicode type, anyway). I believe this behaviour is a bug, on the grounds of being confusing. The same holds for writing a Unicode string to a file in binary mode. Again, it should not write out the internal representation. Or else, why doesn't file.write(42) work? I want that it writes the internal representation in binary :-) So in essence, I suggest that the Unicode object does not implement the buffer interface. If that has any undesirable consequences (which ones?), I suggest that 'binary write' operations (sockets, files) explicitly check for Unicode objects, and either reject them, or invoke the system encoding (i.e. ASCII). In the case of smtplib, this would do the right thing: the protocol requires ASCII commands, so if anybody passes a Unicode string with characters outside ASCII, you'd get an error. Regards, Martin
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4