Martin v. Löwis wrote: >>> Are you serious? >> Are you? ;-? You are the one naming a codec-agnostic error handler (if >> I understand correctly, and correct me if I do not) after a particular >> codec, and denying that that could cause confusion. See other message. > > I can only repeat what I said before: I call it What, specifically, is 'it'? > utf8b because that's > the established name for the algorithm Which algorithm? > it implements. Again, what is 'it'? As *I* read the sentence above, it is not true. I went to the site you referred to as the source of your reasoning and specifically http://hyperreal.org/~est/utf-8b/releases/utf-8b-20060413043934/utf_8b.c The algorithm called utf-8b *IS* utf-8 with the addition or replacement (of an error return) of essentially one line in each direction: # encode if 0xDC00 <= codepoint <= 0xDCFF: byte = codepoint - 0xDC00 #encode Note: for security concerns, you are increasing the lower limit to 0xDC80. The comment at the top of the utf_8b.c, suggests that that is what it should be and should have been in the file, with the other half of that surrogate area an error along with the other surrogate area. #decode if (0x80 <= byte <= 0xFF) and utf-8-invalid(byte): codepoint = byte + 0xDC00 # decode > That algorithm was originally designed with UTF-8 in mind (and only > meant to be applied for UTF-8), however, it remains the same algorithm > even though PEP 383 widens its application. The error handler designed with utf-8 in mind has no name in the encode direction and is called "utf_8b_decoder_invalid_bytes" in the decode direction. By your reasoning, *that* should be its name in Python. The encoding error handler would then be named analogously "utf_8b_encoder_invalid_codepoints". Even these, to me, would be better than confusing giving them the same name as the codec. Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4