Hi, I wanted to share my thoughts about the issue and wrote the following essay. Please note: all of this is IMHO of course. Nothing is cast in stone. I'm very willing to change it to whatever decisions the discussion leads to. Sorry about the length of this posting. :) Any and all comments are appreciated. Please send your comments to the list only, to avoid duplicate copies to my mailbox. Thank you! Bye, Mike The Need for a Declarative Syntax Element in Python --------------------------------------------------- (aka "Getting PEP 318 back on track") Author: Mike Pall Release: 2004-04-05 1. Introduction 1.1 Clean and Lean 1.2 Sugar is Mean 1.3 Pep up Your Life 2. Getting it Straight 2.1 Terminology (n); cf. lack of ~ 2.2 I Hereby Declare ... 2.3 Definitions 2.4 Now, what? 3. Foreign Territories 3.1 Big Brother in Action: C# Attributes 3.1.1 Syntax 3.1.2 Semantics 3.1.3 Use Cases 3.2 Catch Up, Baby: Java Annotations 3.2.1 Syntax 3.2.2 Semantics 3.2.3 Use Cases With JSR-175 Annotations 3.2.4 Use Cases With javadoc Annotations 4. Pythonic (Ab)use Cases 4.1 Assorted Attributes 4.2 Roaring Registries 4.3 Witty Wrappers 4.4 Proper Properties 4.5 To Sync or Not to Sync 4.6 Lexical Liberty 5. Semantic Wonderland 5.1 What? 5.2 How? 5.2.1 Calling the DO 5.2.2 Using __declare__ 5.2.3 Passing The Context 5.3 When? 5.4 What else? 6. Syntax, Syntax on the Wall 6.1 Wishful Thinking 6.2 Round Up The Candidates 6.3 Narrowing the Candidate Set 6.4 And The Winner is ... 6.5 ASCII Art 6.6 Finally 1. Introduction --------------- 1.1 Clean and Lean Python is an imperative language and has very few declarative elements embedded down in the language specification. And most of them are in the form of explicit statements (like "def", "class" or "global"). Only a few are implicit (that's why some people have mixed feelings about "yield"). There are only two declarative delimiters ("*" and "**" in parameter lists) and there are no non-statement declarative keywords (owing to the fact that Python has pure dynamic typing). This is in fact good. That's why we all love Python so much: the syntax is clean and lean! Just compare what other languages have embedded deep down in their lexical analyzer and their grammar: type definitions in C, the massive lexical and grammatical overloading of "*" and "&" in C++, scope-sensitive keywords like "static", "volatile" or "private" in C, C++ or Java. Some of that is pure necessity in languages with static typing, to save some typing (no pun). But most of it can be described with just one word: yuck! Python has (so far) successfully avoided the syntax inflation. Even classic OO stuff like "classmethod" or "staticmethod" are just builtin types. You have to invoke (instantiate) them yourself in the body of a class definition and modify the binding to the method name by assigning to the class dictionary. Alas that works only *after* the method definition is done, i.e. it has to be after the method body. 1.2 Sugar is Mean Unfortunately simplicity comes at a price: a severe shortage of syntactic sugar. This is not necessarily a bad thing as the instance reference dilemma shows: Some languages use implicit scoping (C++, Java) and provide overrides ("this"). Others use explicit delimiters ("@" in Ruby). Python does neither and leaves the name of the identifier for the instance reference up to the programmer (though "self" is a well established convention). In this case most of us are happy about it, because "Explicit is better than Implicit" holds here (you may object that "@" is explicit enough, but well ... then it's implicit in the method definition). However other issues leave something to be desired. The pressure to get a nicer syntax for "classmethod" and "staticmethod" has been traditionally rather low. Probably because neither is *that* common (though this may depend on your programming style). And this is where PEP 318 comes into play ... 1.3 Pep up Your Life The proposal of PEP 318 has resulted in a flurry of activity at the python-dev mailing list culminating during February and March 2004. Even shouting and mutual accusations have been reported. Most everone would be happy to get back to real work, as the subject has been creating an enormous amount of traffic but little consent. PEP 318 goes a bit further than simply proposing a nicer syntax for "classmethod" and "staticmethod". It proposes a generic 'decorator' syntax that allows you to write your own decorator functions. And you get to see the decorator definition *before* the method body. Most of the discussion has centered around what the most desirable syntax would be. However everyone's definition of 'desirable' differs remarkably. And the lack of convincing use cases hasn't helped keeping the discussion on track either. The early syntax proposals have some similarity with the C# language feature "[Attribute] definition". The feature is called C# attributes but cannot be compared directly to decorators nor Python attributes. That of course sparked some discussion about using decorators for defining attributes, but this is an entirely different matter. A few regularly scheduled syntactic duels later, we are still where we started: a solution in search of its problem. Knowing we want 'something', but not spelling it out. My humble hope is that this document helps to improve the quality of the ongoing discussion. 2. Getting it Straight ---------------------- 2.1 Terminology (n); cf. lack of ~ All this talk about decorators and attributes is really missing the point. Well, the word 'attribute' suffers severe proliferation in computer science. Everyone seems to have a different opinion on its meaning. The Pythonic definition is at best misleading when it comes to our discussion. And a 'decorator' is a software design pattern -- this is not a language issue. But if we want to extend syntax we _have_ to talk about language. Computer language design, that is. So let me rephrase our desire in proper terms: ==> We want to define a new kind of DECLARATIVE SYNTAX ELEMENT for Python. Now that it's out of the hat, you can gain new appreciation at the preceeding discussion. Part of the problem: naming conveys meaning. Not having a proper name for something will lead you astray. 2.2 I Hereby Declare ... I sure don't want to patronize the language cracks out there. You all know very well what declarative vs. imperative means. However giving some examples in context is always useful. So here are a few of the declarative notions we would like to express, phrased in natural language statements: "This is a classmethod." "This function needs to be run at exit." "This is an abstract class." "This is the documentation for a class attribute." "This method has been written by John Doe." "This class implements the Foo interface." "This method implements the Visitor pattern for this class." "This parameter must be an integral type." "This compound block shall by synchronized by a lock." "This statement should be supressed when DEBUG is not defined at runtime." All of these statements _declare_ that some other language element (class, method, function, class attribute ...) has a specific property. Usually they tell you something about the _effect_ of that declaration, but most of this knowledge is implied (you ought to know what happens when you use such a statement). However in general they do NOT tell you _how_ this effect is accomplished (i.e. implemented). Nor should they. Declarative syntax owes much of its power to the effect that most goes on behind the scenes. Someone else has of course written up in glorious detail how that is to happen. But you (the user of such a declaration) just get to say what you _want_. Not more, but no less. BTW: the word 'statement' in this context means 'natural language statement'. In computer languages we have more degrees of freedom, like keywords or delimiters. That's why I called it a declarative syntax _element_. The discussion about syntactic alternatives deserves an entire section of its own (see below). Ok ... back to our high tech adventure. 2.3 Definitions Declarative Syntax Element (DSE) The syntax element includes the Declaration itself and any lexical tokens that are required to make it recognizable as such by the parser. It can be either a grammatical production or a compound lexical token. This may be a minor issue from a visual point of view, but has subtle implications when it comes to applicability of the syntax element. Declaration The content of the DSE. As far as the grammar goes it could be anything from an identifier up to a suite. It could even be in a language of its own (don't do that). However looking ahead to the section on semantics it makes most sense to use an expression. The term 'Declaration' is a bit too generic and would apply to other syntactic elements, too (e.g. a method definition includes a declaration). Thus it is suggested that the more specific term be used everywhere. Declarative Expression (DE) The most likely syntactical choice for a Declaration. Any standard Python expression is possible here. Declarative Object (DO) The result of evaluating a Declarative Expression. Target of the DSE The grammatical element that is the target of the DSE. This could be anything from a method definition down to a statement or even an expression. The exact range of grammatical variety that could or should be permissible is discussed in the sections on syntax and semantics. Processing of the DSE Processing of the DSE involves several steps: parsing the DSE, Compilation of the DE, Binding to the Target, Evaluating the DE and Applying the DO. These steps may or may not occur in that order. Each step may be assigned to a different processing phase. The important decision to make is the exact point in time _when_ each step is done. Binding to the Target The process of binding either the DE or the DO to the Target. Evaluation of the DE The process of evaluating a DE gives us a DO. This however does not (yet) affect the Target. Applying the DO The process of applying the DO to the Target of the DSE. 2.4 Now, what? Ok, now that we have set things straight its time to get our hands dirty. I suggest we first take a peek over the fence, have a closer look at use cases, take a refreshing detour to semantic wonderland and then visit syntax hell again. Gee, like all good things, the fun part is at the end. 3. Foreign Territories ---------------------- 3.1 Big Brother in Action: C# Attributes Ignoring C#, just because it comes from the company who brought you EDLIN and BlueScreens is NOT a wise move. You gotta keep up with the news or you loose track. What the heck, the language spec (ECMA 334 -- freely available for download) goes so far as to explicitly spell out that you should NOT use Hungarian notation for identifiers. Whoa! Maybe they made their mind up?!? And for our little discussion C# has quite something to offer. Not only does it have an extensible declarative element. The same company brings you to new heights of enjoyment with their .NET and Longhorn initiatives. And part of that all-new lifestyle is using C# in every imaginable corner. So we get some desperately needed use cases to fuel our discussion. Even overly contorted ones -- promise! So let's explain the way C# attributes work in our terms. 3.1.1 Syntax An attribute section consists of one or more C# DSEs prepended to the target. A C# DSE is an optional attribute-target plus a colon and a list of attributes enclosed in brackets: DSE ::= "[" [attribute-target ":" ] attribute ("," attribute)* [","] "]" A C# attribute cannot be an arbitrary expression. It looks like a call to a constructor, but without "new": attribute ::= type [ "(" arguments ")" ] However the arguments look like any regular call and allow for arbitrary constant expressions. Both positional and named arguments are allowed. I'll save you the gory details of the syntax. Most of the time the target of the DSE is implicitly derived from context. The attribute-target allows you to specify the target explicitly. The permissible targets are: - An assembly or module. - A type declaration. - A method or operator declaration. - A parameter or return value declaration. - A field declaration. - An accessor declaration. - An event declaration. - An enum declaration. Defining an attribute is simple: write a class that derives from the abstract class System.Attribute or any subclass of it. 3.1.2 Semantics No big surprises so far. The semantics however are a bit tricky: During compilation the argument expressions are evaluated. They must resolve to constant arguments. An instance constructor is derived from the type and the type of the arguments. The type, the constructor and the arguments are bound to the target in the image for the module. During runtime when an attribute is accessed its constructor is called and returns the attribute instance. This is available with the GetCustomAttributes() reflection method. C# has static typing so compilation of one module requires access to all referenced modules. Several 'magic' attributes assigned to members of the referenced modules are evaluated at compile time to achieve interesting effects: - [AttributeUsage] specifies the scope of an attribute definition. I.e. which target(s) it applies to, whether it is inherited and if multiple applications to the same target are allowed. - [Conditional] allows conditional compilation. This allows you to omit the call to conditional methods defined in other modules depending on defines set while compiling the module containing the call. So basically it is sufficient to tag the target once and get the effect everywhere else whenever it is used. Confused? Please read section 24.4.2 of the spec. - [Obsolete] marks some elements as obsolete. When the compiler encounters it on a referenced element it emits a warning. - The standard mentions a [Flags] attribute that applies to enums and redefines it to automatically assign bitfield masks instead of values to the enum members. However I have not found enough information to guess the implementation details. There are a couple of other attributes that directly influence the runtime. E.g. the JIT compiler takes hints from the [StructLayout] or the [MarshalAs] attributes. And automatic permission elevation (including permission checks) is performed when code security attributes are present. C# has another interesting feature: it allows you to omit the 'Attribute' suffix from say [FooAttribute]. So when the compiler encounters [Foo] and finds a type named Foo that is not derived from the Attribute base class it searches again for FooAttribute. This is useful for interfaces where you can get both declarative and imperative behaviour using the same name. Ok, so far so good. But you cannot achieve the same kind of effects with user defined attributes in C# since that would either require extending the compiler or force compile time evaluation of attributes. This is where XC# comes in: this extension to the C# compiler opens up the compilation entity tree and allows you to operate on that. It comes packaged with some basic attributes: declarative assertions (contracts), code coverage analysis, design rule verification, spell checking and (oh well) code obfuscation. You can write your own custom attributes and you can inspect and modify every aspect of the grammar on the fly. 3.1.3 Use Cases The following use cases have been culled from various documents at MSDN and other sources. I've stripped the implementations and just left the bare declarations and definitions to avoid clutter. *** An attribute definition uses attribute declarations, too: [AttributeUsage(AttributeTargets.All)] public class HelpAttribute: Attribute { public HelpAttribute(string url) { this.url = url; } public string Topic = null; private string url; public string Url { get { return url; } } } The attribute we just defined can be used like this: [Help("http://.../Class1.html")] public class Class1 { [Help("http://.../Class1.html", Topic = "Method1")] public void Method1() {} } *** A DEBUG Conditional: [Conditional("DEBUG")] public static void DebugMessage(string msg) { ... } // Somewhere else, even in a different module ... // The *call* to the method is omitted if DEBUG is not defined while // compiling *this* module. DebugMessage("foo") ... *** The enum magic: [Flags] public enum FileModes { UserRead, UserWrite, UserExecute, ... } This one gets you 1, 2, 4, ... instead of 0, 1, 2, ... *** Custom serialization (marshaling, pickling) made easy: [Serializable()] public class PartiallySerializableClass { public string serializeMe; public int serializeMeToo; [NonSerialized()] public string leaveMeAlone; } Not only is this easier to specify than imperative serialization support. It is also less error prone because the serializer and deserializer is autogenerated from the type information. *** An XC# example of a declarative assertion (contract) that dynamically adds assertion checking code at compile time: void WriteHashCode([NotNull] object o) { Console.WriteLine(o.GetHashCode()); } ... generates the following code with an imperative assertion: void WriteHashCode(object o) { Debug.Assert(o != null); Console.WriteLine(o.GetHashCode()); } But the really interesting consequences of a declarative assertion are: - It allows for runtime introspection. - If you include it in the interface declaration, it automatically applies to any implementation. This allows for inheritance of declarative aspects. *** Calls to unmanaged code: class NativeMethod { [DllImport("msvcrt.dll", EntryPoint="_getch")] public static extern int GetCh(); } *** Defining security behaviour: [SecurityPermission(SecurityAction.Deny, Flags = SecurityPermissionFlag.UnmanagedCode)] private static void CallUnmanagedCodeWithoutPermission() { ... } [SuppressUnmanagedCodeSecurityAttribute()] [DllImport("msvcrt.dll")] internal static extern int puts(string str); *** WinFS extensions: [Folder(Uri="#System/Drive")] public class Drive { ... [Key] public string DriveName { [Probe] get { return drive; } set { ; } } ... [Probe(Uri="Name=_")] public static Drive GetLogicalDrive(string name) { ... } ... [Probe(Uri="GetAll", ResultType=typeof(Drive))] public static Drive[] GetLogicalDrives() { ... } ... } *** Indigo (Longhorn remote messaging framework): URL: http://msdn.microsoft.com/longhorn/default.aspx?pull=/library/en-us/dnlong/html/indigoattrprog.asp [System.MessageBus.Services.PortTypeChannelAttribute (UsingName="IndigoServiceClient")] public interface IHelloChannel : System.MessageBus.Services.IDatagramPortTypeChannel { [System.MessageBus.Services.WrappedMessageAttribute (Namespace="http://tempuri.org/")] [System.MessageBus.Services.ServiceMethodAttribute()] [return: System.MessageBus.Services.WrappedMessageAttribute (Namespace="http://tempuri.org/")] string Greeting(string name); } [DatagramPortType (Name = "Hello", Namespace = "http://tempuri.org/")] public class IndigoService { [ServiceMethod] public string Greeting (string name) { ... } } 3.2 Catch Up, Baby: Java Annotations Java has plenty of declarative syntax elements. Some people argue that Java is overly declarative. Some like it. Some don't. But up until recently it had no support for an extensible DSE. Ok, ok ... Java always had javadoc tags embedded in comments: /** * An abstract class with foo-like behaviour. * * @author John Q. Doe * @see Bar */ public abstract class Foo { ... } These are processed by an extra tool (javadoc) to autogenerate the docs from the sources. Calling this a DSE is ok, but it is not part of the language itself and as such has no access to its structure (you cannot pass constant expressions). Some say it was a quick hack initially. But it stuck. And it got abused. A lot. Examples below. It seems now we have a plethora of tools that scan for javadoc tags and then do some pre- or post-processing of your source and class files. But then came JSR-175 (public review ended November 2003) ... 3.2.1 Syntax JSR-175 introduces a new syntax for 'Annotations'. An annotation definition starts with the "@interface" keyword. A severly restricted variant of the interface definition syntax is used. It has implicit and final inheritance from java.lang.annotation.Annotation. Well ... to me it looks just like a glorified struct declaration anyway. An annotation can be applied to any declaration: classes, interfaces, fields, methods, parameters, constructors, enums, local variables and enum constants (implied field declarations). However the same annotation type can be applied at most once to any entity. Application of an annotation uses one of three syntactic variants: "@" type "(" member-name "=" member-val ["," member-name "=" member-val]* ")" "@" type "@" type "(" single-member-value ")" The latter two are just shorthands for variants of the first one with an empty list or for a single member type. This muddies the distinction between types and constructors quite a bit though. The values must be constant expressions of course. 3.2.2 Semantics During compilation the constant expression arguments for annotations are evaluated and combined into a set of initializers for the structure defined by the annotation interface. There are three retention policies for this set of initializers: - Source-only annotations are dropped at compile time. This is supposed to replace/augment javadoc tags in source comments. Local-variable annotations are always source-only. - Class-only annotations are stored in the class file but not loaded into the runtime. - Runtime annotations are stored in the class file and are loaded by the runtime. Reflection support via java.lang.reflect.AnnotatedElement works only for runtime annotations. Reflection is easy to use because only a single annotation for each type may be present for each element. The classes used to represent annotations are created at runtime using dynamic proxies. Owing to static typing the Java compiler has to read referenced class files and understands a few special annotations (excerpt): - @Target specifies the allowable targets for an annotation definition. - @RetentionPolicy() specifies the retention policy for an annotation definition. - @Inherited allows for inheritance of annotations from superclasses (this is NOT inheritance for annotation definitions themselves). To summarize: nothing spectacular and not very dynamic. It shows that the feature has been added to the language as an afterthought. The key improvement over javadoc tags is that external tools do not have to parse Java source files (unless pre-processing is required) and that annotations can be stored in the class file. This however does not obviate the need for some post-processors either. 3.2.3 Use Cases With JSR-175 Annotations Since the spec is pretty young you won't find many use cases right now: *** Giving a new tune to javadoc stuff: public @interface RequestForEnhancement { int id(); String synopsis(); String engineer(); String date(); } @RequestForEnhancement( id = 2868724, synopsis = "Provide time-travel functionality", engineer = "Mr. Peabody", date = "4/1/2004" ) public static void travelThroughTime(Date destination) { ... } *** This is the definition for the @Retention meta-annotation: public enum RetentionPolicy { SOURCE, CLASS, RUNTIME } @Documented @Retention(RUNTIME) @Target(ANNOTATION_TYPE) public @interface Retention { RetentionPolicy value(); } 3.2.4 Use Cases With javadoc Annotations *** Some stuff for BEA WebLogic using Java web services annotations: /** * @jws:location http-url="http://localhost:7001/webapp/Bank.jws" * @jws:protocol http-soap="true" */ public interface BankControl extends ServiceControl { } *** Security annotations and conversational support for web services: /** * @common:security single-principal="false" */ public class PurchaseSupplies implements com.bea.jws.WebService { /** * @common:operation * @jws:conversation phase="start" */ public void requestPurchase() { } /** * @common:operation * @jws:conversation phase="continue" */ public void approvePurchase() { } /** * @common:operation * @jws:conversation phase="finish" */ public void executePurchase() { } } *** Embedded SQL for Java: /** * @jc:sql statement:: * SELECT name * FROM employees * WHERE name LIKE {partialName} * :: */ public String[] partialNameSearch(String partialName); 4. Pythonic (Ab)use Cases ------------------------- Leaving the exact choice for the syntax aside I used my personal favourite everywhere. I suggest you substitute it with your favourite choice just to see the visual effect. I'm sorry, but I gave up trying to identify the original contributor for each of these examples. Let's say it's a true community effort. A big thank you to everyone for your input! 4.1 Assorted Attributes Attribute setters modify the target attributes but not the target itself. *** The first one is generic, the second one could inherit from it: <| funcattrs(foo = 1, bar = "baz") |> def foobar(x): ... <| rstattrs( arguments = (1, 0, 1), options = {'class': directives.class_option}, content = 1) |> def admonition(*args): return make_admonition(nodes.admonition, *args) <| rstattrs(content = 1) |> def attention(*args): return make_admonition(nodes.attention, *args) *** Here is a nice example to show how neatly the vertical bars line up: <| Author("Joe Q. Random") |> <| Copyright("ACME Inc.") |> <| Release("2004-04-01") |> <| Version(0, 0, 1) |> def industrialStrengthMethod(self): raise BlueScreenError(random.Random()) *** Hints for mad metaclasses with magic mangling motives: <|override|> def overrideMe(self, foo): ... *** With clever definition of the DO application process (see semantics) we get docstrings *before* the corresponding definition. I guess nobody will like the idea, but anyway: <|"""This is a new style docstring for a class. Bla bla bla ... Bla bla bla ... Bla bla bla ... """|> class HeavilyDocumented(object): <|"Class attribute docstrings now have the proper position."|> BUFSIZE = 8192 <|"This is a new style docstring for a method."|> def method1(): pass <|"Dynamic %s [%s]." % ("docstring" % time.ctime()) |> def thisReallyWorks(): pass 4.2 Roaring Registries Registry functions add a reference to the target in some other object. Often combined with attribute setters. *** This one shows where it would be convenient to have a declarative type with the same name as an inheritable imperative type: <| WebService("soap://localhost:8888/mywebservice") |> class MyWebService(WebService): <| ServiceMethod(None, str) |> def setName(self, name): self.name = name <| ServiceMethod(str) |> def sayHello(self): return "Hello %s!" % self.name; *** For more complicated interfaces to environments with static typing it would be nice to have declarations that apply to parameters and/or the return value: <| ServiceMethod |> def <|str|> method(self, <|str|> foo, <|int|> bar): .... 4.3 Witty Wrappers Wrappers generally wrap functions or methods with their own functions. The latter is then stored in the current dictionary to replace the original function. Oh BTW: the 'synchronized' wrapper has a section of its own. See below. *** We know these two well enough by now: <|classmethod|> def AClassMethod(cls, foo): pass <|staticmethod|> def AStaticMethod(foo): pass *** Using multiple decorators: <| Decorator1(), Decorator2("Just a random string") |> def someMethod(...): ... # which is equivalent to: <| Decorator1() |> <| Decorator2("Just a random string") |> def someMethod(...): ... *** Generics for Python? Well, if you must: <|generic(int)|> def f(x): print x, 'is an int' <|generic(str)|> def f(x): print x, 'is a string' *** A variation of the theme: runtime type checking: <|CheckSignature(int, int)|> def f(x, y): return x+y+1 *** Interface evolution does not need to be messy: <|Context("version", 1)|> def movePlayerMessage(self, arg1): .... <|Context("version", 2)|> def movePlayerMessage(self, arg1, arg2): .... Yes, the second declaration needs to be able to get at the dict entry for the first one (see semantics). *** Defining a native method interface: # The first one should be applicable to the module. TODO: But how? <|native.Library("gtk-x11-2.0")|> <|native.ClassMap()|> class GtkScrolledWindow(GtkWindow): # Yow, the great renaming ... the GNOME people would love it ... :) <|native.EntryPoint("gtk_scrolled_window_set_policy")|> def setPolicy(self, <|GtkPolicyType|> hScrollbarPolicy, <|GtkPolicyType|> vScrollbarPolicy): pass <|native.EntryPoint("gtk_scrolled_window_get_shadow_type")|> def <|GtkShadowType|> getShadowType(self): pass 4.4 Proper Properties The following has been proposed to make property definition easier: class Foo(object): <|propget|> def x(self): return self.__x <|propset|> def x(self, newx): self.__x = val <|propdel|> def x(self): del self.__x The problem with this is that propxxx needs to go through some hoops to update the previous definition of x. This includes using sys._getframe() and then searching the local and global dicts of the frame. Oh dear! This is a good argument why the DO should get the target object _and_ the dict where the TO is to be stored. A different idea would be to follow the C# precedent which provides an extra nesting level for properties: class Foo { private int x_; public int x { // <-- !! get { return x_; } set { x_ = value; } // value is an implicit identifier } } Foo foo = new Foo() foo.x = 1 int y = foo.x However I don't know how to do this in Python. Declaring a method as a property and using nested functions for the getter/setter does not work because the locals (i.e. the nested functions) are lost: class Foo(object): <|property|> def x(self): # Does not work! def get(): return self._x def set(value): self._x = value Doing the same thing with an inner class might work somehow, but I'm not sure. We could get inheritance for properties, too!? TODO: I haven't explored this any further. More input is welcome. 4.5 To Sync or Not to Sync Java has special declarative syntax for synchronized methods and blocks: public abstract class FancyControlStream extends FancyStream { public synchronized void putMsg(byte[] data) { } public int kickBack() { ... synchronized (obj) { obj += len; obj.notify(); } ... } } This would be a good use case for our DSE, too. But ... being only applicable to a method definition just doesn't cut it: <|Synchronized(lock)|> def put(self, item): ... This is way too tedious for some stuff. So, what about: <|Synchronized()|> class ThreadSafeQueue(Queue): pass # Yes, this is all that's needed. On the other hand you really want to use fine grained locking for more ambitious endeavors. And this is where you need to use locks at the compound statement level (called suites in Python and blocks elsewhere): def stopStream(self): <| Synchronized(self.recv_lock) |> if self.recv_len<=0: if self.recv_thread: self.recv_thread.interrupt() self.recv_thread=None self.send_close=True <| Synchronized(self.send_lock) |> while self.send_len>=0 && self.send_close: self.send_lock.notify() Getting it at the statement level would be nice, too. But you could work around this with 'if 1: SUITE'. How you would _implement_ this is another story. And this is where the bad reputation from Java's 'synchronized' comes from: there have been some initial design flaws. However just because you have one bad precedent does not mean we can't do it better. IMHO declaring some parts of a program to be regions with synchronized access is inherently useful. 4.6 Lexical Liberty To get at the lexical level you may need a different delimiter which is evaluated at compile time. I have opted not to replace it with a different one in the following examples. Please don't flame me. I'm just documenting some strange ideas: class Whatever(object): # Uh, imports and assignments in a declarative compile-time context? <| from declarative.lexical import * |> <| T = Conditional("TRACE") |> # These statements are optimized away at module load time. def run(self): <|T|> self.tracePrint("Startup") ... <|T|> self.tracePrint("Shutdown") # Combining the power of Python with the syntax of SQL. Or vice versa? def query1(self, start, end) return <|SQL|> "SELECT * FROM table WHERE name BETWEEN :start AND :end" def query2(self, start, end) return <|SQLPy|> SELECT * FROM table WHERE name BETWEEN :start AND :end def query3(self, table, a, b) return <|PySQL|> SELECT "*" FROM table WHERE "f1" == a*2 AND "f2" >= a+b # C has inline assembler. Python has inline C. def fast1(self): <|C|> "{ for (int i=0; i<100; i++) somefunc(i) }" def fast2(self): <|C|> { for (int i=0; i<100; i++) somefunc(i) } def fast3(self, a, b): x = self.foo + <|C|> int dummy(int a, int b) { for (int i=0; i<1000; i++) { a=a+3*i+(b&15); b+=a; } return b; } <||> return x <|C|> int fast4(int a, int b) { for (int i=0; i<1000; i++) { a=a+3*i+(b&15); b+=a; } return b; } <||> # Sssshhh! Don't mention the r-word! <|LexicalSubstitute("@", ("self", "."))|> <|Grammar("parameter_list").afterParsing( lambda p: p.insert(0, "self") )|> class Wherever(object): def __init__(): @flags=[] def setFlag(flag): @flags.append(flag) 5. Semantic Wonderland ---------------------- 5.1 What? First we need to specify what kind of grammatical element a DSE may contain. I could go on an reason about all the possibilities. But nobody ever mentioned anything else than an expression. It seems to be the most straightforward approach. This document is already far too long. A DSE contains a DE. So be it. :) Expressions evaluate to a single object. This is the DO in our case. For multiple declarations in a single DSE there are two possibilities: - Allow a plain expression and add the commas to the DSE grammar. - Allow an expression_list and add tuple handling. Multiple DOs from multiple adjacent DSEs result in a DO list (in the given order). The target the DSE applies to is to be specified by the syntax. To be able to bind the DSE to the target it needs to be bundled into an object. We call this the Target Object (TO). What the TO needs to contain depends on the target. The issues with arbitrary grammatical elements are discussed below (see 'What Else?'). So let's restrict the target to functions, methods and classes for now. Each one of them already bundles its contents into an object. But _defining_ one of them involves adding its name to a dictionary, too: - The Target Dictionary (TD) is required by the DO for some use cases. - The name to be defined is accessible from the TO for all three types. But it is either read-only (for functions) and/or modifying it may not get you the desired effect. It is unclear whether we have any use case that requires storing a different name in the TD instead of the one used in the target. Being able to supress the store might be desirable, too. TODO: Ideas? 5.2 How? [My bet is that this section will be the most controversial.] We need to specify how each of the required steps is to be performed. First the straightforward stuff: - Parsing the DSE: since this is a plain Python expression it is to be performed by the compiler. - Compilation of the DE: dito. - Binding to the target: the Pythonic way to do this is to emit code for the module or class initialization code objects. The code gets to mangle the compiled DE, the target, the target name and the TD. [ It is an implementation detail whether ... - a call to each DO is explicitly emitted OR - whether we call a builtin function or emit a new bytecode that gets passed the TO and all DOs. The latter may have some advantages for future extensibility. ] TODO: Code generation needs to be specified in detail. - Evaluating the DE: this is done by the code compiled from the DE. Now the troublesome part: applying the DO. 5.2.1 Calling the DO The original proposal was derived from the way stuff like classmethod works: *** TO is a function: def func(x): ... func=callable(func) *** TO is a method: class Foo(object): def method(args): ... method=callable(method) So we require the DO to be a callable and apply it by calling it, passing the TO. The return value is then used as the new TO and stored under the target name in the TD. I.e.: TO = DO(TO) [ Subtle side effect: The TD may contain a previous element named like the target before the target definition is performed. This is available to the DO because the TO is not stored until all DOs have been applied. This cannot be achieved without some renaming games by doing it the classic way. ] There are three common cases for DEs: *** The DE is a function call that evaluates to an inner function. The DO is the inner function. It may be bound to derivatives of some arguments of the DE. Applying the DO means calling the inner function. def funcattrs(attrs): def inner(func): for name, value in attrs.iteritems(): setattr(func, name, value) return func return inner *** The DE is a class constructor and evaluates to an instance of the class. The DO is an instance. It may hold derivatives of some arguments of the DE. Applying the DO means calling the instance (which only works if you define a __call__ method). class funcattrs(object): def __init__(self, **kwds): self.attrs = kwds def __call__(self, func): for name, value in self.attrs.iteritems(): setattr(func, name, value) return func *** The DE is a class and evaluates to a class. The DO is a class. Applying the DO means to call the constructor of the class. A side effect of this is that the resulting TO is an instance of the class. The primary example for this are classmethod and staticmethod which are builtin types. 5.2.2 Using __declare__ Reusing the callable attribute of objects for declarator application has a few drawbacks: - Any callable may be used as a DO. There is no error checking. Mistaking imperative classes for declarative classes, dropping empty braces, scoping mismatches or just plain carelessness will NOT be caught. This may be very hard to track down since the resulting TO is not checked for validity. Errors may pop up very late or never. Any error message you might get will probably be misleading. - A class cannot have both declarative and imperative behaviour. However as the precedent given by C# shows, this might be quite useful. - Inheritance is a useful concept for declarative behaviour, too. Thus it is most likely that DE is a class constructor. This requires the use of __call__ which is neither self-documenting nor exactly easy to explain to newbies. I propose to use a new kind of slot with the name "__declare__". The DO must define this slot and it must be bound to a callable. Applying the DO means calling the slot and passing the TO. The return value is used as the new TO and stored under the target name in the TD. I.e.: TO = DO.__declare__(TO) This has the following advantages: - Existing objects cannot be accidentially used as declarative expressions. - Error checking is done at module load time and misuse will abort the load with a unique error message exactly pointing out the problem. - Defining a method with the name __declare__ makes your intention crystal clear. Newbies can find out about the meaning easily by searching for __declare__ in the documentation. Compare this with the value the documentation for __call__ would have to a newbie. - Whenever you see a declaration you know exactly where to look in the class definition to find out what it does. - Requiring a __declare__ slot strongly encourages to use classes instead of functions for declarative behaviour. Since classes are inheritable, this will over time improve the quality of the code base. Using functions is still possible but one has to set the __declare__ slot explicitly in the function dictionary. You can even avoid defining an inner function if you don't need bound DE arguments (otherwise a class would be the cleaner solution). - We can easily support the old and the new way to apply classmethod and staticmethod. We also can adapt the new calling sequence to our needs without breaking compatibility with existing uses of classmethod and staticmethod (see the next section). - We may set __declare__ slots on internal types if we want to give them declarative behaviour without necessarily making them callables. E.g. you can have strings behave like docstrings if used declaratively. I know very well that adding a new slot requires changes in several places. But I think the benefit outweighs the required work by far. TODO: Write up what needs to be done to add a new slot. 5.2.3 Passing The Context As you can see from the use cases, some of them need access to the TD. The current workaround involves sys._getframe() and is ugly beyond belief. We can avoid this problem by passing the TO *and* the TD to the DO. I.e.: TO = DO.__declare__(TD, TO) This may be required, too for some future extensions that involve targets other than functions, methods and classes. In general it is not a bad idea to pass the context where the target is defined. Since applying the DOs is usually done at module initialization there is no performance penalty for passing two arguments instead of one. Code generation would be pretty easy, too. Summary: we gain flexibility with minimal cost. TODO: Check the issue mentioned above about modifying the target name before it gets stored in the TD. We could pass a modifiable reference to the name somehow but that would complicate the code generation for this case quite a bit. Ideas? 5.3 When? There are basically two choices for the temporal behaviour of the DSE: - A DSE with access to the lexical level needs compile time evaluated DEs. As indicated in other sections this is not within the current scope of this document. - Any other imaginable DSE is only useful if the DE is evaluated and applied at module initialization time. Any runtime-only behaviour can be implemented on top of it. This implies that all other processing steps need to be done at compilation time (parsing the DSE, compilation of the DE and binding to the target). Phew. That was easy. A subtle issue that has been discussed previously is the order of application of the DEs. Previously you had to write things backwards to get the correct order of application (d first, then e): def f(x): ... f=e(d(f)) Thus the question arised, what the order might be with the new syntax: <|d,e|> def f(x): ... The natural interpretation depends a little bit on the syntax chosen (whether the DSE is before or after the function name). But with my favourite syntax there is no ambiguity: d is applied before e. Also <|d,e|> is equivalent to <|d|> <|e|>. It helps to remember that Pythons existing declarative statements (such as "def") are implemented in an imperative way. And this is clearly top-to-bottom and left-to-right. If the order of application is important, then it is the duty of the user to get the order of the DEs right. It is however deemed unlikely that this is a problem in reality. TODO: I cannot find a way for the DO class author to specify the preferred order of application (other than through documentation). Priorities would not help since they would need to be attached to the DSE, which is pretty useless (remember: the compiler does not see the DO class). Comments? 5.4 What else? The 'synchronized' issue highlights the need for having declarative syntax that is applicable to a wider range of syntactic constructs. But there is some trouble ahead with Python: methods and classes are first class objects. Compound statements are not. Passing a method or a class to a function to modify them is easy. Inserting the returned object (possibly a wrapper) into the module or class dictionary is easy, too. Getting the same level of support for compound statements or even individual statements is way more difficult. You basically have two choices for this: a) Make statements first class objects and somehow optimize this artifact away by using late code combining. b) Allow compile time evaluation of a declarative element giving it access to the lexical level. Ok, so a) is more akin to the way statements inside a class definition incrementally build up the class. You'd have to plan ahead a bit and think how this could be extended to declarations for parameters and other stuff, too. But b) is a really a much more powerful tool. Forget about CPP macros. Forget about SQL preprocessors. You can do all of this yourself now just by writing some Python functions. C has inline assembler, Python could have inline C! Embedding foreign languages, extending the grammer, even redefining your favourite lexical tokens: you get it, if you ask for it. [ Just a thought: you could do b) 'the language way' and open up the compiler or 'the text processing way' and open up the input stream. The latter would be pretty easy to do, I think. But then you may not get to see or modify all the compiler context you need. ] Both issues need A LOT MORE discussion and are certainly NOT something for the 2.4 time frame or for PEP 318. However we should take a mental note about two things: - A new declarative element SHOULD have the potential to be applied to almost any other syntactic element. This must be reflected in the initial syntactic _specification_ but not necessarily in the initial _implementation_. - Having a declarative element evaluated at compile time holds big potential for the future. Although this could emulate any other more traditional kind of declarative element, the framework for this is just not ready today. So in fact I do NOT propose to specify such an element right now. However I propose to either a) allow for a later extension of the declarative element that we are defining now OR b) define a second declarative element later that shares most of the syntax but has different temporal evaluation behaviour. 6. Syntax, Syntax on the Wall ----------------------------- Hey, the first Python beauty contest in history. Quick, place your bets. 6.1 Wishful Thinking The Syntax ... Requirement Requirement Definition: Mnemnonic The Syntax ... ------------------------------------------------------------------------------ |LEXICAL| MUST fit into the existing lexical structure. |PARSER| SHOULD put no extra burden on the existing parser. |GRAMMAR| MUST fit into the existing grammar. |GENERIC| SHOULD allow for the declarative element to be applied to almost any other syntactic element. |EXPR| SHOULD allow for a general expression as the content of the declarative element. |SPLIT| SHOULD allow for lengthy element content that may need to be split across several lines. |SEQ| SHOULD allow for a convenient notation for a sequence of declarative elements. |CONCISE| MUST be concise. |VISUAL| MUST stand out visually. |DISTINCT| SHOULD be immediately recognizable as a distinct language feature. |MIMIC| SHOULD NOT mimic completely unrelated features of other common programming languages. |NEWBIE| SHOULD NOT overly confuse newbies. :) There are a lot more issues, but I think I covered the most important ones from a language perspective. Feedback is appreciated of course. >From a language perspective not much attention has been given to |GENERIC|. This is what worried me most and got me to write this essay. 6.2 Round Up The Candidates The existing lexical and grammatical structure allows ... {STATEMENT} a statement with a (possibly new) keyword: as DECL {COMPOUND} a compound statement with a (possibly new) keyword: as: DECL ... {EXTENSION} to extend an existing grammatical element: def f(x) DECL: {CONTEXT} an existing lexical element in a different context: def [DECL] f(x): {REDEFINE} redefining the meaning of an existing grammatical construct: [DECL] def f(x): {UNARY} a new introducing delimiter: @DECL .DECL {ASYM} a new asymmetric paired delimiter: *[DECL] @[DECL] {SYM} a new symmetric paired delimiter: [|DECL|] <|DECL|> I guess there are some more variations on this theme, but none too relevant. 6.3 Narrowing the Candidate Set Well our BDFL has already spoken out and narrowed the set considerably. But I think it's still worthwile to discuss this in detail: {STATEMENT} has not been received well, because it would almost certainly necessitate a new keyword. And neither has a good one been proposed nor should the introduction of a new keyword be taken lightly. Getting |GENERIC| straight might prove to be difficult, too. And don't forget about |EXPR| and |VISUAL| while you are at it. Even if we manage to find a good name, it's doubtful that everyone seeing such a construct knows that it is evaluated in a different context than almost any other statement (violating |NEWBIE|). {COMPOUND} looks strange to me because it does not fit well into the lexical structure (putting aside the variants that violate |LEXICAL|). Compound blocks usually contain statements and not expressions. And of course it only works if the target of the declarative element is a block (violating |GENERIC|). Prepending a new kind of block to an existing target block introducer is confusing because one may expect another level of nesting here (which would violate |CONCISE|). Adding a new kind of block just _after_ the target block introducer destroys the visual link between the introducer and the body of the block. The compact variant (putting a single declarative element right after the colon) shares the concerns about |NEWBIE| with {STATEMENT}. In short I cannot find a compelling reason to add a new kind of compound statement to the language. And a compound declarative statement is ... well ... awkward (considering the lack of precedent set by other languages). {EXTENSION} has been thrown out early because it violates |CONCISE|, |VISUAL| and |MIMIC|. But I think far more important are the violation of |SEQ| and |EXPR|. {CONTEXT} has been discussed to death but this is just because the initial discussion focused on variants of this syntax. Other than that it has few merits. Since it is an existing lexical construct (list displays) it violates |GENERIC|, |DISTINCT| and |NEWBIE|. Depending on the position it may also violate |PARSER|, |GRAMMAR| and |VISUAL|. But I think the most compelling reason against it, is that it hurts the eyes with |SPLIT|. The existing use cases for C# indicate that this would be pretty common. {REDEFINE} is problematic because it redefines an existing (though useless) grammatical construct and as such violates |GRAMMAR| and |DISTINCT|. The grammatical meaning could be redefined, but it would only work just before a block introducer (violating |GENERIC|). Just because this syntax mimics C# somewhat is not a good enough reason to introduce the same syntax to Python. It cannot mimic it completely anyway due to lexical differences (break-on-white-space vs. break-on-lines-or-sequence-delimiters). It also violates the principle of least surprise: running a program using this construct in an older version of Python won't get you a syntax error. And the desired effect is dropped silently -- which can be a very dangerous thing to do. {UNARY} fails on |VISUAL| for |SPLIT| because the end may be hard to match to the start. It may be hard for |PARSER| in intra-line contexts because of |EXPR|. Restricting the construct to be used only on extra lines would violate |GENERIC| and maybe |CONCISE|, too. I do not think we should waste one of the three remaining unused non-alphanumeric ASCII characters (@, $, ?) for this purpose. It would miserably fail on |MIMIC| anyway (except for the Java precedent). Prepending a dot is not an option because it would not conform to |GENERIC|, |VISUAL|, |DISTINCT|, |MIMIC| and |NEWBIE|. Also GvR indicated that he wanted to reserve this construct for a future "with" statement. {ASYM} doesn't look bad if the construct is used on extra lines. It may be a bit harder to spot in an intra-line context. {SYM} looks good in any context. It would be natural to define it like any other paired sequence delimiter ((), [] and {}). This allows for flexible line breaks and broad grammatical applicability. 6.4 And The Winner is ... Beware: everything I wrote is IMHO of course, but this applies especially to this section! {ASYM} saves a single character to type over {SYM} (which is not much of a gain). OTOH the asymmetry spoils the visual effect quite a bit. So my personal winner is: {SYM} Your mileage may vary though. Even if you disagree with my reasoning, the best the preceeding section buys you is a systematic way to discuss all alternatives now. Go ahead! 6.5 ASCII Art If (BIG IF!) there is a concluding discussion that indeed supports my reasoning about the lexical alternatives we should decide/vote/pronounce on a delimiter pair. Since @, $ and ? usually do not come in pairs, the shortest possible delimiter pair is two characters long each. We should include one of the four visual pairs that ASCII has proudly brought to your home for the past decades ((), [], {} or <>). For best visual effect the pair should be used for the outer characters. The inner characters may be a single non-paired character or another set of paired characters. Of course neither the opening nor the closing delimiter should collide with an existing delimiter. Nor should the inner character be mistaken for the start of an expression or (less important) for the end. That leaves us with: (< DECL >) [< DECL >] {< DECL >} ($ DECL $) (% DECL %) (& DECL &) (/ DECL /) (: DECL :) (= DECL =) (? DECL ?) (@ DECL @) (^ DECL ^) (| DECL |) [$ DECL $] [% DECL %] [& DECL &] [/ DECL /] [: DECL :] [= DECL =] [? DECL ?] [@ DECL @] [^ DECL ^] [| DECL |] {$ DECL $} {% DECL %} {& DECL &} {/ DECL /} {: DECL :} {= DECL =} {? DECL ?} {@ DECL @} {^ DECL ^} {| DECL |} <$ DECL $> <% DECL %> <& DECL &> </ DECL /> <: DECL :> <= DECL => <? DECL ?> <@ DECL @> <^ DECL ^> <| DECL |> Now, before you choose, consider that you can use it without any spaces between the delimiter and the declarative expression. I guess this would be common for e.g. classmethod. That's why my personal favourites (in descending order of preference) are: <|DECL|> <== I like this one most. <:DECL:> (|DECL|) [|DECL|] <%DECL%> <?DECL?> The vertical bars line up neatly if you use them on successive lines. And it visually separates the content well, even without spaces. But back to a more scientific analysis: conveying meaning with delimiters is hard. Only precedent may help us here: So <? ?> is used in XML for processing instructions (PIs). But it's up to the language used for the PI whether the content of the PI has declarative or imperative meaning. We'd fall prey to |MIMIC| I guess. Choosing </ /> would not help to clear up matters, either. Most other delimiters have no common precedent (that I know of). But the inner character has. That makes $, @ and = seem awkward choices. Oh and when picking one, we should think about reserving another one for a future compile time evaluated variant, too (IMHO <:DECL:> has some merit here). BTW: choosing smileys as lexical tokens would sure help to entertain slashdot for weeks! :) 6.6 Finally Still with me? Good, because here is the syntax I propose: Lexical definitions: DECL_ELEMENT_START ::= "<|" DECL_ELEMENT_END ::= "|>" The tokens should be defined so they work like the existing paired delimiters (i.e. allowing NEWLINE). Grammar definitions: decl_element ::= "<|" [expression] ("," expression)* [","] "|>" OR decl_element ::= "<|" [expression_list] "|>" The expressions should yield declarative objects of course (see the section on semantics). The DSE should be placed immediately *before* the lexical element to which it applies. Multiple declarative elements may stack up and all apply to the following element. The use cases indicate the following possible applications in descending order of importance: - Method and function definitions ("def"). - Class definitions ("class"). - Related to the module (there is no good place to put it though ...). - Class attributes, i.e. assignments in a class definition. - All compound statements ("if", "while", "for", "try"). - Parameter definitions. - Related to the return value of a method or function (either by applying it to "return" or by putting it between "def" and the function name). - All statements, i.e. assignments, too. The sheer variety indicates that we should not restrict the DSE to be usable only on a line of its own. We might as well make the DSE applicable to *any* grammatical element. >From a language perspective that would certainly be pretty orthogonal. >From an implementation perspective we may disallow it or ignore it for some cases. The exact way to pass the target of the DSE to the declarative object(s) needs to be figured out, though. A more radical thought would be to treat the DSE as a compound lexical token. This could then be used before ANY token. It would work like a C comment block. Yes, this smells a bit like a macro, but it doesn't behave like one (unless we do compile time evaluation). And no, I wouldn't place my bets on this one. TODO: More input is required. Please go ahead. -------- End of Document --------
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4