Hi On Tue, Jan 02, 2007 at 04:23:00AM +0200, Siarhei Siamashka wrote: > Hello all, > > I started optimizing 'dct_unquantize_h263_intra' for ARM (armv5te). Attached > patch improves performance already ('dct_unquantize_h263_helper_armv5te' is > twice faster than 'dct_unquantize_h263_helper_c' , also there is a visible > improvement for overall video decoding performance). This code is a > straightforward optimization of 'mpegvideo.c' (only assuming that result of > multiplication does not overflow 16-bits). Right now it takes about 7 cycles > to process each element. But I checked 'mpegvideo_mmx.c' and got some > more optimization ideas. > > Is it safe to assume: > 1. Result of 'level = level * qmul - qadd' will never overflow signed 16-bits? yes > 2. DCTELEM *block is always at least 8 bytes aligned? yes if not its a bug > 3. Processing extra elements after block[nCoeffs] is safe (up to but not > including block[(nCoeffs + 7) / 8 * 8])? block[0 .. 63] is always safe nCoeffs <= 64 [...] > + > +#include "../dsputil.h" > +#include "../mpegvideo.h" > +#include "../avcodec.h" > + > +/** > + * h263 dequantizer supplementary function, it is performance critical > + * and needs to have optimized implementations for each architecture > + */ > +static inline void dct_unquantize_h263_helper_c(DCTELEM *block, int qmul, int qadd, int count) > +{ > + int i, level; > + for (i = 0; i < count; i++) { > + level = block[i]; > + if (level) { > + if (level < 0) { > + level = level * qmul - qadd; > + } else { > + level = level * qmul + qadd; > + } > + block[i] = level; > + } > + } > +} this looks like a duplicate of dct_unquantize_h263_inter_c() ? > + > +/* GCC 3.1 or higher is required to support symbolic names in assembly code */ > +#if (__GNUC__ > 3) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 1)) > + > +/** > + * Code optimized for armv5te, uses fast single cycle 16-bit dsp multiply > + * instruction, is unrolled to process 4 elements per iteration and has > + * code sheduled to avoid pipeline stalls. Should take 7 cycles > + * per element on ARM926EJ-S (Nokia 770) > + */ > +#define dct_unquantize_h263_helper_armv5te(__block, __qmul, __qadd, __count) \ things starting with __ are reserved in C please dont use such names [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Freedom in capitalist society always remains about the same as it was in ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070102/91322bfe/attachment.pgp>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4