I would like to get some feedback on some thoughts that I've been having about cooperative multitasking using generators. After some attempts to change that 'flow' module [1] into a C module, I got thinking about how to better support my requirements with a relatively small change to the Python interpreter. Anyway, if you would humor me a bit, perhaps you can see what I'm getting at; perhaps the change to Python could be relatively localized and simple. And Python would be quite more useful in cooperative multitasking contexts. First, let me provide an application context. Suppose you are building a webpage in response to an HTTP request. Further suppose that the information for this web page comes from two sources, a PostgreSQL database and an OpenLDAP directory. Now assume that you want to make your page building modular, that is, broken into several operations with sub operations. Here is, perhaps a call graph of how the page would be done: buildPage | buildHeader | | writeTopOfHeader | | writeMetaKeywords | | queryLDAP* (returns iteratable keyword sequence) | | writeRestOfHeader | buildNavigator | | writeTopOfNavigator | | writeNavigatorRows | | queryLDAP* (to get items for the Navigator) | | writeRestOfNavigator | buildGrid | | writeTopOfGrid | | writeRows | | queryDatabase* (returns sequence of rows ) | writeFooter Assume that each of these items is a generator, and that queryLDAP, and queryDatabase generators either yield a value (in the sequence that they return) or return a special sentinel 'Cooperate'. Then, assume that you have a reactor which is building N pages at a time. To make this magic work, each one of the intermediate generators (buildPage, buildGrid, writeRows) on the way to a leaf generator (queryDatabase) must explicitly check for this 'Cooperate' special value, and then yield themselves with this same value. In this way, you can 'pause' the whole chain of generators. The flow module [1] tries to make this easy, and makes the logic for 'Cooperate' to work by scheduling each page to be build. Second, let me describe the problem (or irritant) Well, what happens in the flow module, is that each one of the intermediate generators needs to be 'wrapped' to handle exceptions and other concerns. This wrapper slows things down substantially, especially in tight loops. Furthermore, each intermediate generator must be made 'aware' that it may be paused (an unnecessary ickyness) for example, the following code: for x in parentGenerator: # do something with X has to be rewritten as: parentGenerator = flow.wrap(parentGenerator) yield parentGenerator for x in parentGenerator: # do something with X yeild parentGenerator It works, but it is just stuff that I think could be done much better in the guts of python itself. And without all of the tricks needed to coax exceptions into working as you'd expect. Third, let me describe what I 'think' would be a nice solution I'd like a special value, lets call it Pause, which can be given to a yield statement. When this is encountered by the Python interpreter, it bypasses all of the intermediate generators and goes to the most recent non-generator. For example, in the a queryDatabase yield of Pause would bypass the chain (buildPage, buildGrid, writeRows) and the caller of buildPage's next() method would receive the Pause object. Then, when the next call to buildPage's next() method is invoked, it would resume the top level generator (queryDatabase) and if that generator yields a non-Pause value, the stack would unwind as normal. So, in an attempt to be more explicit: - Let a generator chain C be composed of three generators, g, g', g'' which are calling each other. Let f be a function which is iterating over the generator g. - Let the generator g'' yield a value p, which is a subclass of a special 'Pause' built-in class. - At this point, the Python evaluator goes down the stack frame to find the first non-generator in the stack, in this case, f. - And then the evaluator creates a new instance zzz of a PauseIter object. Then, every instance of g in f is replaced with zzz. At this point, zzz is initialized with g and g'', as the head and tail of the paused generator. - The function f is given the value p as its result of g.next() - When the function f calls g.next() again (which actually calls zzz.next() instead), the generator g'' is resumed. - If g'' yield another Pause object, then this object is passed back to the function f, and things continue - If g'' yield a non-Pause object, then f() is rewritten to link to g again, and the non-Pause object is handed to g' so that normal processing proceeds - If g'' creates an exception, then f() is rewritten to link to g again, and the exception object is percolated down to g' for normal processing. Fourth, in conclusion I think with a chance similar to above, Cooperative Multitasking in Python with the ability to break-down generators into sub-generators becomes easy to manage (a function f can simply keep each micro-thread in a queue, etc.). I'm not sure if I understand the implications of all of this, and in particular, how hard a ceval.c would be, but I'd be very interested to hear your opinion. Kind Regards, Clark
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4