Hello All, First some background information and rationale. Nokia 770 [1] graphics chip has support for packed YUV422 color format (IMGFMT_YUY2 according to ffmpeg classification) but does not support scaling (except for pixel doubling feature which can scale image exactly twice). So fullscreen video playback suffers a severe performance penalty if it needs scaling. And I got some information that PXA270 in latest Sharp Zaurus PDA [2] also doesn't have hardware scaling capabilities, but do support YUV colorspace (which formats exactly are supported still needs to be clarified). So developing a fast ARM optimized scaler for these and similar devices makes sense. A natural solution for getting good scaler performance is to use JIT style dynamic code generation. I spent full two days on the last weekend and got some initial scaler implementation working (it is quite simple and straightforward and uses less than 300 lines of code): https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer Its API is quite similar to libswscale, but a bit simplified. You need to initialize scaler context by providing source and destination resolution, and also quality level setting. Code for scaling of a horizontal line of pixels is dynamically generated on this stage. Once context is initialized, it can be used to scale planar YUV image data and get results in YUY2 format. Horizontal scaler works in the following way: each pixel in the destination buffer is either a copy of some pixel in the source buffer or an average value (1:1 proportion) of two nearest pixels. It is possible to extend scaling precision to add averaging proportions 1:3 and 3:1 with almost no overhead. Vertical scaling now just maps some source buffer line to each destination buffer line, but it can be probably extended to add support for 1:1 proportion averaging of two neighbour source pixel lines to get destination buffer line. So depending on quality setting, we get either nearest neighbour scaler or some kind of simplified low precision bilinear scaler. In order to estimate performance, I did some benchmarks with mplayer_1.0rc1-maemo.8 [3] which aready has this JIT code in use. # mplayer -nosound -benchmark -quiet -endpos 100 [scaler_settings] video.avi *** -sws 4 *** SwScaler: Nearest Neighbor / POINT scaler, from yuv420p to yuyv422 using C SwScaler: using C scaler for horizontal scaling SwScaler: using n-tap C scaler for vertical scaling (BGR) SwScaler: 640x272 -> 400x170 BENCHMARKs: VC: 62.645s VO: 58.738s A: 0.000s Sys: 1.053s = 122.435s BENCHMARK%: VC: 51.1654% VO: 47.9746% A: 0.0000% Sys: 0.8599% = 100.0000% *** -sws 1 *** SwScaler: BILINEAR scaler, from yuv420p to yuyv422 using C SwScaler: using C scaler for horizontal scaling SwScaler: using n-tap C scaler for vertical scaling (BGR) SwScaler: 640x272 -> 400x170 BENCHMARKs: VC: 64.029s VO: 164.350s A: 0.000s Sys: 1.321s = 229.700s BENCHMARK%: VC: 27.8750% VO: 71.5500% A: 0.0000% Sys: 0.5750% = 100.0000% *** JIT scaler, quality = 1 (nearest neighbour) *** [nokia770] Using ARM JIT scaler (quality=1) to scale 640x272 => 400x170 BENCHMARKs: VC: 63.033s VO: 5.585s A: 0.000s Sys: 0.940s = 69.559s BENCHMARK%: VC: 90.6193% VO: 8.0295% A: 0.0000% Sys: 1.3512% = 100.0000% *** JIT scaler, quality = 2 (use pixel copy or 1:1 proportion averaging for horizontal scaling, nearest neighbour for vertical scaling) *** [nokia770] Using ARM JIT scaler (quality=2) to scale 640x272 => 400x170 BENCHMARKs: VC: 62.893s VO: 7.551s A: 0.000s Sys: 1.000s = 71.444s BENCHMARK%: VC: 88.0310% VO: 10.5686% A: 0.0000% Sys: 1.4004% = 100.0000% So performance improvement over standard libswscale scalers (first two runs) is really huge. JIT scaler with quality setting 1 and nearest neighbour scaler from libswscale are direct competitors here and JIT scaler implementation is 10x faster :) Using JIT scaler quality 2 settings, I can see some 'sparkles' in the image on vertical panning scenes, but horizontal panning looks ok. So I expect a good quality after improving vertical scaling by adding lines averaging. Now I wonder if it would be a good idea to include this JIT scaler for ARM into ffmpeg and what are the requirements for that? Of course I will clean up this code first, add more sanity checks and comments (most likely on next weekend). But I'm more worried about integration into libswscale code without turning it into a mess. 1. Is there any documentation about internal libswscale structure and some hacking guidelines? 2. I see that scalers from libswscale have to support slices. Is it the only extra requirement or I should be aware of something else? 3. What would be the best mapping of the scaling methods used in this JIT scaler code to libswscale scaling algorithms (nearest neighbour is clear, but I'm not sure about the rest). [1] http://en.wikipedia.org/wiki/Nokia_770 [2] http://en.wikipedia.org/wiki/Zaurus [3] https://garage.maemo.org/projects/mplayer/
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4