Dag Sverre Seljebotn wrote: > On 05/16/2012 02:47 PM, Mark Shannon wrote: >> Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 16.05.2012 12:48: >>>> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote: >>>>>> Agreed in general, but in this case, it's really not that easy. A C >>>>>> function call involves a certain overhead all by itself, so calling >>>>>> into >>>>>> the C-API multiple times may be substantially more costly than, say, >>>>>> calling through a function pointer once and then running over a >>>>>> returned C >>>>>> array comparing numbers. And definitely way more costly than >>>>>> running over >>>>>> an array that the type struct points to directly. We are not talking >>>>>> about >>>>>> hundreds of entries here, just a few. A linear scan in 64 bit steps >>>>>> over >>>>>> something like a hundred bytes in the L1 cache should hardly be >>>>>> measurable. >>>>> I give up, then. I fail to understand the problem. Apparently, you >>>>> want >>>>> to do something with the value you get from this lookup operation, but >>>>> that something won't involve function calls (or else the function call >>>>> overhead for the lookup wouldn't be relevant). >>>> In our specific case the value would be an offset added to the >>>> PyObject*, >>>> and there we would find a pointer to a C function (together with a >>>> 64-bit >>>> signature), and calling that C function (after checking the 64 bit >>>> signature) is our final objective. >>> >>> I think the use case hasn't been communicated all that clearly yet. >>> Let's >>> give it another try. >>> >>> Imagine we have two sides, one that provides a callable and the other >>> side >>> that wants to call it. Both sides are implemented in C, so the callee >>> has a >>> C signature and the caller has the arguments available as C data >>> types. The >>> signature may or may not match the argument types exactly (float vs. >>> double, int vs. long, ...), because the caller and the callee know >>> nothing >>> about each other initially, they just happen to appear in the same >>> program >>> at runtime. All they know is that they could call each other through >>> Python >>> space, but that would require data conversion, tuple packing, calling, >>> tuple unpacking, data unpacking, and then potentially the same thing >>> on the >>> way back. They want to avoid that overhead. >>> >>> Now, the caller needs to figure out if the callee has a compatible >>> signature. The callee may provide more than one signature (i.e. more >>> than >>> one C call entry point), perhaps because it is implemented to deal with >>> different input data types efficiently, or perhaps because it can >>> efficiently convert them to its expected input. So, there is a >>> signature on >>> the caller side given by the argument types it holds, and a couple of >>> signature on the callee side that can accept different C data input. >>> Then >>> the caller needs to find out which signatures there are and match them >>> against what it can efficiently call. It may even be a JIT compiler that >>> can generate an efficient call signature on the fly, given a suitable >>> signature on callee side. >> >>> >>> An example for this is an algorithm that evaluates a user provided >>> function >>> on a large NumPy array. The caller knows what array type it is operating >>> on, and the user provided function may be designed to efficiently >>> operate >>> on arrays of int, float and double entries. >> >> Given that use case, can I suggest the following: >> >> Separate the discovery of the function from its use. >> By this I mean first lookup the function (outside of the loop) >> then use the function (inside the loop). > > We would obviously do that when we can. But Cython is a compiler/code > translator, and we don't control usecases. You can easily make up > usecases (= Cython code people write) where you can't easily separate > the two. > > For instance, the Sage projects has hundreds of thousands of lines of > object-oriented Cython code (NOT just array-oriented, but also graphs > and trees and stuff), which is all based on Cython's own fast vtable > dispatches a la C++. They might want to clean up their code and more > generic callback objects some places. > > Other users currently pass around C pointers for callback functions, and > we'd like to tell them "pass around these nicer Python callables > instead, honestly, the penalty is only 2 ns per call". (*Regardless* of > how you use them, like making sure you use them in a loop where we can > statically pull out the function pointer acquisition. Saying "this is > only non-sluggish if you do x, y, z puts users off.) Why not pass around a PyCFunction object, instead of a C function pointer. It contains two fields: the function pointer and the object (self), which is exactly what you want. Of course, the PyCFunction object only allows a limited range of function types, which is why I am suggesting a variant which supports a wider range of C function pointer types. Is a single extra indirection in obj->func() rather than func(), really that inefficient? If you are passing around raw pointers, you have already given up on dynamic type checking. > > I'm not asking you to consider the details of all that. Just to allow > some kind of high-performance extensibility of PyTypeObject, so that we > can *stop* bothering python-dev with specific requirements from our > parallel universe of nearly-all-Cython-and-Fortran-and-C++ codebases :-) If I read it correctly, you have two problems you wish to solve: 1. A fast callable that can be passed around (see above) 2. Fast access to that callable from a type. The solution for 2. is the _PyType_Lookup() function. By the time you have fixed your proposed solution to properly handle subclassing I doubt it will be any quicker than _PyType_Lookup(). Cheers, Mark.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4