On 01/11/2014 10:36 AM, Steven D'Aprano wrote: > On Sat, Jan 11, 2014 at 08:20:27AM -0800, Ethan Furman wrote: >> >> unicode to bytes >> bytes to unicode using latin1 >> unicode to bytes > > Where do you get this from? I don't follow your logic. Start with a text > template: > > template = """\xDE\xAD\xBE\xEF > Name:\0\0\0%s > Age:\0\0\0\0%d > Data:\0\0\0%s > blah blah blah > """ > > data = template % ("George", 42, blob.decode('latin-1')) > > Only the binary blobs need to be decoded. We don't need to encode the > template to bytes, and the textual data doesn't get encoded until we're > ready to send it across the wire or write it to disk. And what if your name field has data not representable in latin-1? --> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8') u'\u0441\u0440\u0403' --> '\xd1\x81\xd1\x80\xd0\x83'.decode('utf8').encode('latin1') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256) So really your example should be: data = template % ("George".encode('some_non_ascii_encoding_such_as_cp1251').decode('latin-1'), 42, blob.decode('latin-1')) Which is a mess. -- ~Ethan~
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4