On 3/30/2011 6:39 PM, Toshio Kuratomi wrote: > Really, surrogates are a red herring to this whole issue. The issue is that > the original code was trying to compare two different transformations of > byte sequences and expecting them to be equal. Let's say that you have the > following byte value:: > b_test_value = b'\xa4\xaf' > > This is something that's stored in a file or the filename of something on > a unix filesystem or stored in a database or any number of other things. > Now you want to compare that to another piece of data that you've read in > from somewhere outside of python. You'd expect any of the following to > work:: > b_test_value == b_other_byte_value > b_test_value.encode('utf-8', 'surrogateescape') == b_other_byte_value('utf-8', 'surrogateescape') > b_test_value.encode('latin-1') == b_other_byte_value('latin-1') > b_test_value.encode('euc_jp') == b_other_byte_value('euc_jp') > > You wouldn't expect this to work:: > b_test_value.encode('latin-1') == b_other_byte_value('euc_jp') > > Once you see that, you realize that the following is only a specific case of > the former, surrogateescape doesn't really matter:: > b_test_value.encode('utf-8', 'surrogateescape') == b_other_byte_value('euc_jp') All the encodes above should be decodes instead. Aside from that. your point is correct, and not limited to CS. The whole art of disguise, for instance, is about effecting a transformation to falsely pass or fail an identity or equality comparison. -- Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4