On approximately 12/7/2008 8:13 PM, came the following characters from the keyboard of Stephen J. Turnbull: > Glenn Linderman writes: > > > But if you are interested in checking for security issues, shouldn't you > > _first_ decode into some canonical form, > > Yes. That's all that is being asked for: that Python do strict > decoding to a canonical form by default. That's a lot to ask, as it > turns out, but that is what we (the minority of strict Unicode > adherents, that is) want. I have no problem with having strict validation available. But doesn't validation take significantly longer than decoding? So I think it should be logically decoupled... do validation when/where it is needed for security reasons, and allow internal [de]coding to be faster. I'm mostly indifferent about which should be the default... maybe there shouldn't be a default! Use the "vUTF-8" decoder for strict validation, and the "fUTF-8" decoder for the faster, non-validating version. Or something like that. With appropriate documentation. Of course, "UTF-8" already exists... as "fUTF-8", so for compatibility, I guess it shouldn't change... but it could be deprecated. You didn't address the issue that if the decoding to a canonical form is done first, many of the insecurities just go away, so why throw errors? -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4