On 25/04/2013 15:22, MRAB wrote: > On 25/04/2013 14:34, Lennart Regebro wrote: >> On Thu, Apr 25, 2013 at 2:57 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: >>> I can think of many usecases where I want to *embed* base64-encoded >>> data in a larger text *before* encoding that text and transmitting >>> it over a 8-bit channel. >> >> That still doesn't mean that this should be the default behavior. Just >> because you *can* represent base64 as Unicode text doesn't mean that >> it should be. >> [snip] >> One use case where you clearly *do* want the base64 encoded data to be >> unicode strings is because you want to embed it in a text discussing >> base64 strings, for a blog or a book or something. That doesn't seem >> to be a very common usecase. >> >> For the most part you base64 encode things because it's going to be >> transmitted, and hence the natural result of a base64 encoding should >> be data that is ready to be transmitted, hence byte strings, and not >> Unicode strings. >> >>> Python 3 doesn't *view* text as unicode, it *represents* it as unicode. >> >> I don't agree that there is a significant difference between those >> wordings in this context. The end result is the same: Things intended >> to be handled/seen as textual should be unicode strings, things >> intended for data exchange should be byte strings. Something that is >> base64 encoded is primarily intended for data exchange. A base64 >> encoding should therefore return byte strings, especially since most >> API's that perform this transmission will take byte strings as input. >> If you want to include this in textual data, for whatever reason, like >> printing it in a book, then the conversion is trivial, but that is >> clearly the less common use case, and should therefore not be the >> default behavior. >> > base64 is a way of encoding binary data as text. The problem is that > traditionally text has been encoded with one byte per character, except > in those locales where there were too many characters in the character > set for that to be possible. > > In Python 3 we're trying to stop mixing binary data (bytestrings) with > text (Unicode strings). > RFC 4648 says """Base encoding of data is used in many situations to store or transfer data in environments that, perhaps for legacy reasons, are restricted to US-ASCII [1] data.""". To me, "US-ASCII" is an encoding, so it appears to be talking about encoding binary data (bytestrings) to ASCII-encoded text (bytestrings).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4