In article <3FF6FFE3.60608 at v.loewis.de>, "Martin v. Loewis" <martin at v.loewis.de> wrote: > > Or, am I missing the point entirely, and there's some other circumstance > > where one gets UnicodeErrors besides .decode()? If the use case is > > mixing strings and unicode objects (i.e. adding, joining, searching, > > etc.), then I'd have to say a big fat -1, as opposed to merely a -0 for > > having other ways to spell .decode(codec,"ignore"). > > Yes, it is these use cases: Somebody invokes an SQL method, which > happens to return a Unicode string, and then adds a latin-1 byte > string to it. It works for all ASCII byte strings, but then the > customer happens to enter accented characters, and the application > crashes without offering to safe recent changes. > > So I guess that's -1 from you. I am -1 also on allowing the programmer to let such errors pass silently. The world of unicode encodings and decodings is painful enough without giving people more freedom to create broken files. I have an application which has to guess which encoding to use for certain files (the file format specifies an encoding but nobody pays attention), and one of the sample files I found on the web can't be read correctly because it mixes UTF-8 and I think CP1252. I am very much in favor of anything that prevents more such files from being created. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4