On Tuesday 23 January 2007 01:12, Guillaume POIRIER wrote: > > A natural solution for getting good scaler performance is to use JIT > > style dynamic code generation. I spent full two days on the last weekend > > and got some initial scaler implementation working (it is quite simple > > and straightforward and uses less than 300 lines of code): > > https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_noki > >a770/?root=mplayer > > > > Its API is quite similar to libswscale, but a bit simplified. You need to > > initialize scaler context by providing source and destination resolution, > > and also quality level setting. Code for scaling of a horizontal line of > > pixels is dynamically generated on this stage. Once context is > > initialized, it can be used to scale planar YUV image data and get > > results in YUY2 format. > > I may sound like a rookie to ask this, but could you tell me what > dynamic code generation precisely allows to do that can't be done with > "straight code"? > Also, why (optimized) dynamic code can be faster that "straight code"? We need a pixel line scaler function that converts N pixels to M here. There is one important difference between dynamically generated and static code. Your static precompiled code does not know N and M values beforehand and needs to handle all the cases at runtime by introducing some extra logic, branching or conditionally executed code. If you have any additional information, you can get a faster implementation. Some obvious example is the case when N == M. A special unscaled variant is a lot faster than universal :) In my tests from the previous post, scaling from 640 pixels to 400 was required. An universal function can't get anything useful from this information. But if we need a nearest neighbour scaler for example, generating code for such line scaler is simple: we will just have to take bytes from some offsets in the source buffer and put them to some offsets into destination buffer. So we will generate a stright set of instructions to get 400 bytes read and 400 bytes written to some predefined locations (no condition checks and no offsets calculations are needed). That's why dynamically generated code is faster. Surely, if you know source and destination image width at compile time, you can develop a special optimized implementation. But you can't put all the possible variants of this function into the executable. And dynamic code generator can be treated as a black box which can provide a (somewhat) optimized function for each particular M and N values at runtime whenever you need it. > I have never written a single line of such kind of code, so I'm > curious. Plus, modern CPUs (PPC, x86 at least) make it harder to > program efficient dynamic code, so I heard. > For instance, if I remember correctly, P4 flushes its trace cache > whenever code cache is written.... pretty un-efficient, isn't it? Code is written only at the stage of line scaler function generation (at initialization), so it does not matter much. When actually performing scaling, it is only executed and not modified. The only important requirement here is that this scaler function should fit instructions cache.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4