Glenn Linderman a écrit : > > If there is going to be a required transformation from de novo strings > to funny-encoded strings, then why not make one that people can actually > see and compare and decode from the displayable form, by using > displayable characters instead of lone surrogates? > The problem with your "escape character" scheme is that the meaning is lost with slicing of the strings, which is a very common operation. >> >> I though half-surrogates were illegal in well formed Unicode. I confess >> to being weak in this area. By "legitimate" above I meant things like >> half-surrogates which, like quarks, should not occur alone? >> > > "Illegal" just means violating the accepted rules. In this case, the > accepted rules are those enforced by the file system (at the bytes or > str API levels), and by Python (for the str manipulations). None of > those rules outlaw lone surrogates. [...] > Python could as well *specify* that lone surrogates are illegal, as their meaning is undefined by Unicode. If this rule is respected language-wise, there is no ambiguity. It might be unrealistic on windows, though. This rule could even be specified only for strings that represent filesystem paths. Sure, they are the same type as other strings, but the programmer usually knows if a given string is intended to be a path or not. Baptiste
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4