Just van Rossum writes: > How will other parts of a program know which encoding was used for > non-unicode string literals? This is the exact reason that Unicode should be used for all string literals: from a language design perspective I don't understand the rationale for providing "traditional" and "unicode" string. > It seems to me that an encoding attribute for 8-bit strings solves this > nicely. The attribute should only be set automatically if the encoding of > the source file was specified or when the string has been encoded from a > unicode string. The attribute should *only* be used when converting to > unicode. (Hm, it could even be used when calling unicode() without the > encoding argument.) It should *not* be used when comparing (or adding, > etc.) 8-bit strings to each other, since they still may contain binary > goop, even in a source file with a specified encoding! In Dylan there is an explicit split between 'characters' (which are always Unicode) and 'bytes'. What are the compelling reasons to not use UTF-8 as the (source) document encoding? In the past the usual response is, "the tools are't there for authoring UTF-8 documents". This argument becomes more specious as more OS's move towards Unicode. I firmly believe this can be done without Java's bloat. One off-the-cuff solution is this: All character strings are Unicode (utf-8 encoding). Language terminals and operators are restricted to US-ASCII, which are identical to UTF8. The contents of comments are not interpreted in any way. > >- We need a way to indicate the encoding of input and output data > >files, and we need shortcuts to set the encoding of stdin, stdout and > >stderr (and maybe all files opened without an explicit encoding). > > Can you open a file *with* an explicit encoding? If you cannot, you lose. You absolutely must be able to specify the encoding of a file when opening it, so that the runtime can transcode into the native encoding as you read it. This should be otherwise transparent the user. -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4