"Martin v. Löwis" writes: > Am 25.08.2011 11:39, schrieb Stephen J. Turnbull: > > "Martin v. Löwis" writes: > > > > > No, that's explicitly *not* what C6 says. Instead, it says that a > > > process that treats s1 and s2 differently shall not assume that others > > > will do the same, i.e. that it is ok to treat them the same even though > > > they have different code points. Treating them differently is also > > > conforming. > > > > Then what requirement does C6 impose, in your opinion? > > In IETF terminology, it's a weak SHOULD requirement. Unless there are > reasons not to, equivalent strings should be treated differently. It's > a weak requirement because the reasons not to treat them equivalent are > wide-spread. There are no "weak SHOULDs" and no "wide-spread reasons" in RFC 2119. RFC 2119 specifies "particular circumstances" and "full implications" that are "carefully weighed" before varying from SHOULD behavior. IMHO the Unicode Standard intends a full RFC 2119 "SHOULD" here. > Yes, but that's the operating system's choice first of all. Some > operating systems do allow file names in a single directory that > are equivalent yet use different code points. Python then needs to > support this operating system, despite the permission of the > Unicode standard to ignore the difference. Sure, and that's one of several such reasons why I think the PEP's implementation of unicodes as arrays of code points is an optimal balance. But the Unicode standard does not "permit" ignoring the difference here, except in the sense that *the Unicode standard doesn't apply at all* and therefore doesn't forbid it. The OSes in question are not conforming processes, and presumably don't claim to be. Because most of the processes Python interacts with won't be conforming processes (not even the majority of textual applications, for a while), Python does not need to be, and *should not* be, a conforming Unicode process for most of what it does. Not even for much of its text processing. Also, to the extent that Python is a general-purpose language, I see nothing wrong and lots of good in having a non-conformant code point array type as the platform for implementing conforming Unicode library(ies). But this is not user/developer-friendly at all: > Wrt. normalization, I think all that's needed is already there. > Applications just need to normalize all strings to a normal form of > their liking, and be done. That's easier than using a separate library > throughout the code base (let alone using yet another string type). But many users have never heard of normalization. And that's *just* normalization. There is a whole raft of other requirements for conformance (collation, case, etc). The point is that with such a library and string type, various aspects of conformance to Unicode, as well as conformance to associated standards (eg, the dreaded UTS #18 ;-) can be added to the library over time, and most users (those who don't need to squeeze every ounce of performance out of Python) can be blissfully unaware of what, if anything, they're conforming to. Just upgrade the library to get the best Unicode support (in terms of conformance) that Python has to offer. But for the reasons you (and Guido and Nick and ...) give, it's not reasonable to put all that into core Python, not anytime soon. Not to mention that as a work-in-progress, it can hardly be considered stable enough for the stdlib. That is what Terry Reedy is getting at, AIUI. "Batteries included" should mean as much Unicode conformance as we can reasonably provide should be *conveniently* available. The ideal (given the caveat about efficiency) would be *one* import statement and a ConformingUnicode type that acts "just like a string" in all ways, except that (1) it indexes and counts on characters (preferably "grapheme clusters" :-), (2) does collation, regexps, and the like conformant to the Unicode standard, and (3) may be quite inefficient from the point of view of bit- shoveling net applications and the like. Of course most of (2) is going to take quite a while, but (1) and (3) should not be that hard to accomplish (especially (3) ;-). > > I'm simply saying that the current implementation of strings, as > > improved by PEP 393, can not be said to be conforming. > > I continue to disagree. The Unicode standard deliberately allows > Python's behavior as conforming. That's up to you. I doubt very many users or application developers will see it that way, though. I think they would prefer that we be conservative about what we call "conformant", and tell them precisely what they need to do to get what they consider conformant behavior from Python. That's easier if we share definitions of conformant with them. And surely there would be great joy on the battlements if there were a one-import way to spell "all the Unicode conformance you can give me, please". The problem with your legalistic approach, as I see it, is that if our definition is looser than the users', all their surprises will be unpleasant. That's not good.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4