RetroSearch Browse

Tue Jan 2 03:23:00 CET 2007 · https://ffmpeg.org/pipermail/ffmpeg-devel/2007-January/026922.html

Hello all,

I started optimizing 'dct_unquantize_h263_intra' for ARM (armv5te). Attached
patch improves performance already ('dct_unquantize_h263_helper_armv5te' is
twice faster than 'dct_unquantize_h263_helper_c' , also there is a visible
improvement for overall video decoding performance). This code is a
straightforward optimization of 'mpegvideo.c' (only assuming that result of
multiplication does not overflow 16-bits). Right now it takes about 7 cycles
to process each element. But I checked 'mpegvideo_mmx.c' and got some 
more optimization ideas.

Is it safe to assume:
1. Result of 'level = level * qmul - qadd' will never overflow signed 16-bits?
2. DCTELEM *block is always at least 8 bytes aligned?
3. Processing extra elements after block[nCoeffs] is safe (up to but not 
including block[(nCoeffs + 7) / 8 * 8])?

It that all is safe (and if I understand mpegvideo_mmx.c code, it should be
safe) it is still possible to squeeze a bit more performance.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpegvideo_armv5te.diff
Type: text/x-diff
Size: 8368 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070102/a16a93a7/attachment.diff>

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://ffmpeg.org/pipermail/ffmpeg-devel/2007-January/026922.html below:

[Ffmpeg-devel] A question about 'dct_unquantize_h263_intra'