On Fri, 08 Oct 2010 12:37:38 +0900, "Stephen J. Turnbull" <stephen at xemacs.org> wrote: > *If* you have an 8-bit value of unknown encoding on input, this will > appear in the Header's value as a surrogate. Hm, OK, I see the > problem ... as usual, it's that the only efficient thing to do is > encode using surrogate-escape which loses the information that these > are invalid bytes. Would it really be that bad to add an O(length) > component where you examine the string for surrogates (and too-long > words, for that matter), and chop off those pieces for MIME encoding? Nope, and that's more or less what I think I'm going to do. But I haven't started writing the code yet. > > > > Presumably you are suggesting that email5 be smart enough to turn my > > > > example into properly UTF-8/CTE encoded text. > > > > > > No, in general that's undecidable without asking the originator, > > > although humans can often make a good guess. > > > > I was talking about unicode input, though, where you do know (modulo > > the language differences that unicode hasn't yet sorted out). > > I don't understand why this is difficult. As far as what Unicode has It isn't difficult in principle. It's just difficult in email5. -- R. David Murray www.bitdance.com
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4