Robert Bradshaw wrote: > On Wed, May 16, 2012 at 8:40 AM, Mark Shannon <mark at hotpy.org> wrote: >> Dag Sverre Seljebotn wrote: >>> On 05/16/2012 02:47 PM, Mark Shannon wrote: >>>> Stefan Behnel wrote: >>>>> Dag Sverre Seljebotn, 16.05.2012 12:48: >>>>>> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote: >>>>>>>> Agreed in general, but in this case, it's really not that easy. A C >>>>>>>> function call involves a certain overhead all by itself, so calling >>>>>>>> into >>>>>>>> the C-API multiple times may be substantially more costly than, say, >>>>>>>> calling through a function pointer once and then running over a >>>>>>>> returned C >>>>>>>> array comparing numbers. And definitely way more costly than >>>>>>>> running over >>>>>>>> an array that the type struct points to directly. We are not talking >>>>>>>> about >>>>>>>> hundreds of entries here, just a few. A linear scan in 64 bit steps >>>>>>>> over >>>>>>>> something like a hundred bytes in the L1 cache should hardly be >>>>>>>> measurable. >>>>>>> I give up, then. I fail to understand the problem. Apparently, you >>>>>>> want >>>>>>> to do something with the value you get from this lookup operation, but >>>>>>> that something won't involve function calls (or else the function call >>>>>>> overhead for the lookup wouldn't be relevant). >>>>>> In our specific case the value would be an offset added to the >>>>>> PyObject*, >>>>>> and there we would find a pointer to a C function (together with a >>>>>> 64-bit >>>>>> signature), and calling that C function (after checking the 64 bit >>>>>> signature) is our final objective. >>>>> >>>>> I think the use case hasn't been communicated all that clearly yet. >>>>> Let's >>>>> give it another try. >>>>> >>>>> Imagine we have two sides, one that provides a callable and the other >>>>> side >>>>> that wants to call it. Both sides are implemented in C, so the callee >>>>> has a >>>>> C signature and the caller has the arguments available as C data >>>>> types. The >>>>> signature may or may not match the argument types exactly (float vs. >>>>> double, int vs. long, ...), because the caller and the callee know >>>>> nothing >>>>> about each other initially, they just happen to appear in the same >>>>> program >>>>> at runtime. All they know is that they could call each other through >>>>> Python >>>>> space, but that would require data conversion, tuple packing, calling, >>>>> tuple unpacking, data unpacking, and then potentially the same thing >>>>> on the >>>>> way back. They want to avoid that overhead. >>>>> >>>>> Now, the caller needs to figure out if the callee has a compatible >>>>> signature. The callee may provide more than one signature (i.e. more >>>>> than >>>>> one C call entry point), perhaps because it is implemented to deal with >>>>> different input data types efficiently, or perhaps because it can >>>>> efficiently convert them to its expected input. So, there is a >>>>> signature on >>>>> the caller side given by the argument types it holds, and a couple of >>>>> signature on the callee side that can accept different C data input. >>>>> Then >>>>> the caller needs to find out which signatures there are and match them >>>>> against what it can efficiently call. It may even be a JIT compiler that >>>>> can generate an efficient call signature on the fly, given a suitable >>>>> signature on callee side. >>>> >>>>> An example for this is an algorithm that evaluates a user provided >>>>> function >>>>> on a large NumPy array. The caller knows what array type it is operating >>>>> on, and the user provided function may be designed to efficiently >>>>> operate >>>>> on arrays of int, float and double entries. >>>> >>>> Given that use case, can I suggest the following: >>>> >>>> Separate the discovery of the function from its use. >>>> By this I mean first lookup the function (outside of the loop) >>>> then use the function (inside the loop). >>> >>> We would obviously do that when we can. But Cython is a compiler/code >>> translator, and we don't control usecases. You can easily make up usecases >>> (= Cython code people write) where you can't easily separate the two. >>> >>> For instance, the Sage projects has hundreds of thousands of lines of >>> object-oriented Cython code (NOT just array-oriented, but also graphs and >>> trees and stuff), which is all based on Cython's own fast vtable dispatches >>> a la C++. They might want to clean up their code and more generic callback >>> objects some places. >>> >>> Other users currently pass around C pointers for callback functions, and >>> we'd like to tell them "pass around these nicer Python callables instead, >>> honestly, the penalty is only 2 ns per call". (*Regardless* of how you use >>> them, like making sure you use them in a loop where we can statically pull >>> out the function pointer acquisition. Saying "this is only non-sluggish if >>> you do x, y, z puts users off.) >> >> Why not pass around a PyCFunction object, instead of a C function >> pointer. It contains two fields: the function pointer and the object (self), >> which is exactly what you want. >> >> Of course, the PyCFunction object only allows a limited range of >> function types, which is why I am suggesting a variant which supports a >> wider range of C function pointer types. >> >> Is a single extra indirection in obj->func() rather than func(), >> really that inefficient? >> If you are passing around raw pointers, you have already given up on >> dynamic type checking. >> >> >>> I'm not asking you to consider the details of all that. Just to allow some >>> kind of high-performance extensibility of PyTypeObject, so that we can >>> *stop* bothering python-dev with specific requirements from our parallel >>> universe of nearly-all-Cython-and-Fortran-and-C++ codebases :-) >> >> If I read it correctly, you have two problems you wish to solve: >> 1. A fast callable that can be passed around (see above) >> 2. Fast access to that callable from a type. >> >> The solution for 2. is the _PyType_Lookup() function. >> By the time you have fixed your proposed solution to properly handle >> subclassing I doubt it will be any quicker than _PyType_Lookup(). > > It is certainly (2) that we are most interested in solving here; (1) > can be solved in a variety of ways. For this second point, we're > looking for something that's faster than a dictionary lookup. (For > example, common usecase is user-provided functions operating on C > doubles which can be quite fast.) _PyType_Lookup() is fast; it doesn't perform any dictionary lookups if the (type, attribute) pair is in the cache. > > The PyTypeObject struct is in large part a list of methods that were > deemed too common and time-critical to merit the dictionary lookup > (and Python call) overhead. Unfortunately, it's not extensible. We > figured it'd be useful to get any feedback from the large Python > community on how best to add extensibility, in particular with an eye > for being future-proof and possibly an official part of the standard > for some future version of Python. I don't see any problem with making _PyType_Lookup() public. But others might disagree. Cheers, Mark.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4