On 2014-01-11 05:36, Steven D'Aprano wrote: [snip] > Latin-1 has the nice property that every byte decodes into the character > with the same code point, and visa versa. So: > > for i in range(256): > assert bytes([i]).decode('latin-1') == chr(i) > assert chr(i).encode('latin-1') == bytes([i]) > > passes. It seems to me that your problem goes away if you use Unicode > text with embedded binary data, rather than binary data with embedded > ASCII text. Then when writing the file to disk, of course you encode it > to Latin-1, either explicitly: > > pdf = ... # Unicode string containing the PDF contents > with open("outfile.pdf", "wb") as f: > f.write(pdf.encode("latin-1") > > or implicitly: > > with open("outfile.pdf", "w", encoding="latin-1") as f: > f.write(pdf) > [snip] The second example won't work because you're forgetting about the handling of line endings in text mode. Suppose you have some binary data bytes([10]). You convert it into a Unicode string using Latin-1, giving '\n'. You write it out to a file opened in text mode. On Windows, that string '\n' will be written to the file as b'\r\n'.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4