> [paul] > > Also, is it really necessary to allow raw non-ASCII characters in = source > > code though? We know that they aren't portable across editing > > environments, so one person's happy face will be another person's = left > > double-dagger. > > [me] > I suppose changing that would break code. maybe it's time > to reopen the "pragma encoding" thread? >=20 > (I'll dig up my old proposal, and post it under a new subject). as brief as I can make it: 1. add support for "compiler directives". I suggest the following syntax, loosely based on XML: #?python key=3Dvalue [, key=3Dvalue ...] (note that "#?python" will be treated as a token after this change. if someone happens to use comments that start with #?python, they'll get a "SyntaxError: bad #?python compiler directive"...) 2. for now, only accept compiler directives if they appear before the first "real" statement. 3. keys are python identifiers (NAME tokens), values are simple literals (STRING, NUMBER) 4. key/value pairs are collected in a dictionary. 5. for now, we only support the "encoding" key. it is used to determine how string literals (STRING tokens) are converted to string or unicode string objects. 6. the encoding value can be any of: "undefined" or not defined at all: plain string: copy source characters as is unicode string: expand 8-bit source characters to unicode characters (i.e. treat them as ISO Latin 1) "ascii" plain string: characters in the 128-255 range gives a SyntaxError (illegal character in string literal). unicode string: same as for plain string any other ascii-compatible encoding (the ISO 8859 series, Mac Roman, UTF-8, and others): plain string: characters in the 128-255 range gives a SyntaxError (illegal character in string literal). unicode string: characters in the 128-255 range are decoded, according to the given encoding. string has been decoded,=20 any other encoding (UCS-2, UTF-16) undefined (or SyntaxError: illegal encoding) to be able to flag this as a SyntaxError, I assume we can add an "ASCII compatible" flag to the encoding files. 7. only the contents of string literals can be encoded. the tokenizer still works on 7-bit ASCII (hopefully, this will change in future versions). 8. encoded string literals are decoded before Python looks for backslash escape codes. I think that's all. Comments? I've looked at the current implementation rather carefully, and it shouldn't be that hard to come up with patches that implement this scheme. </F>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4