Hello all, I started optimizing 'dct_unquantize_h263_intra' for ARM (armv5te). Attached patch improves performance already ('dct_unquantize_h263_helper_armv5te' is twice faster than 'dct_unquantize_h263_helper_c' , also there is a visible improvement for overall video decoding performance). This code is a straightforward optimization of 'mpegvideo.c' (only assuming that result of multiplication does not overflow 16-bits). Right now it takes about 7 cycles to process each element. But I checked 'mpegvideo_mmx.c' and got some more optimization ideas. Is it safe to assume: 1. Result of 'level = level * qmul - qadd' will never overflow signed 16-bits? 2. DCTELEM *block is always at least 8 bytes aligned? 3. Processing extra elements after block[nCoeffs] is safe (up to but not including block[(nCoeffs + 7) / 8 * 8])? It that all is safe (and if I understand mpegvideo_mmx.c code, it should be safe) it is still possible to squeeze a bit more performance. -------------- next part -------------- A non-text attachment was scrubbed... Name: mpegvideo_armv5te.diff Type: text/x-diff Size: 8368 bytes Desc: not available URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070102/a16a93a7/attachment.diff>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4