While it is convenient to be able to time a particular function with a given set of parameters, it is even better to be able to generate a plot of performance over a range of parameters. clFFT can generate performance plots with the help of Python scripts. The python scripts are located at ./src/scripts/perf, but when the INSTALL target is built from the build environment the scripts are copied into the ./bin/clFFT/develop/vs10x64/package directory along with the rest of the built binaries.
The are two primary python scripts that are user interact-able.
This script is responsible for measuring, gathering performance data and recording it in a log file. This script calls the clFFT client program in a loop, modifying program parameters in an organized fashion and scrapes stdOut for performance information. It provides a sophisticated interface that simplifies specifying test ranges and strides. It provides for extensive help information with the --help parameter
C:\clFFT\src\scripts\perf>measurePerformance.py -h usage: measurePerformance.py [-h] [--device DEVICE] [-b BATCHSIZE] [-a CONSTPROBSIZE] [-x LENGTHX] [-y LENGTHY] [-z LENGTHZ] [--problemsize PROBLEMSIZE] [-i INPUTLAYOUT] [-o OUTPUTLAYOUT] [-p PLACENESS] [-r PRECISION] [--ldscomplex] [--ldsfraction LDSFRACTION] [--cachesize CACHESIZE] [--xfactor XFACTOR] [--library {clfft}] [--label LABEL] [--createini CREATEINIFILENAME] [--ini INIFILENAME] [--tablefile TABLEOUTPUTFILENAME] Measure performance of the clFFT library optional arguments: -h, --help show this help message and exit --device DEVICE device(s) to run on; may be a comma-delimited list. choices are ['gpu', 'cpu']. (default gpu) -b BATCHSIZE, --batchsize BATCHSIZE number of FFTs to perform with one invocation of the client. the special value 'max' may be used to adjust the batch size on a per-transform basis to the maximum problem size possible on the device. may be a range or a comma-delimited list. if a range is entered, you may follow it with ':X', where X is the stepping of the range (if omitted, it defaults to a stepping of 1). e.g., 1-15 or 12,18 or 7,10-30:10,1050-1054. the special value 'pow10' expands to '1-9,10-90:10,100-900 :100,1000-9000:1000,10000-90000:10000,100000-900000:10 0000,1000000-9000000:1000000'. Note that 'max' and 'pow10' may not be used in a list; they must be used by themselves; max may only be used with --library clfft. (default 1) -a CONSTPROBSIZE, --adaptivemax CONSTPROBSIZE Max problem size that you want to maintain across the invocations of client with different lengths. This is adaptive and adjusts itself automtically. -x LENGTHX, --lengthx LENGTHX length(s) of x to test; must be factors of 1, 2, 3, or 5 with clFft; may be a range or a comma-delimited list. e.g., 16-128 or 1200 or 16,2048-32768 (default 1) -y LENGTHY, --lengthy LENGTHY length(s) of y to test; must be factors of 1, 2, 3, or 5 with clFft; may be a range or a comma-delimited list. e.g., 16-128 or 1200 or 16,32768 (default 1) -z LENGTHZ, --lengthz LENGTHZ length(s) of z to test; must be factors of 1, 2, 3, or 5 with clFft; may be a range or a comma-delimited list. e.g., 16-128 or 1200 or 16,32768 (default 1) --problemsize PROBLEMSIZE additional problems of a set size. may be used in addition to lengthx/y/z. each indicated problem size will be added to the list of FFTs to perform. should be entered in AxBxC:D format. A, B, and C indicate the sizes of the X, Y, and Z dimensions (respectively). D is the batch size. All values except the length of X are optional. may enter multiple in a comma-delimited list. e.g., 2x2x2:32768 or 256x256:100,512x512:256 -i INPUTLAYOUT, --inputlayout INPUTLAYOUT may enter multiple in a comma-delimited list. choices are ['cp', 'ci']. ci = complex interleaved, cp = complex planar (default ci) -o OUTPUTLAYOUT, --outputlayout OUTPUTLAYOUT may enter multiple in a comma-delimited list. choices are ['cp', 'ci']. ci = complex interleaved, cp = complex planar (default ci) -p PLACENESS, --placeness PLACENESS may enter multiple in a comma-delimited list. choices are ['in', 'out']. in = in place, out = out of place (default in) -r PRECISION, --precision PRECISION may enter multiple in a comma-delimited list. choices are ['single', 'double']. (default single) --ldscomplex turn on complex LDS (default off) --ldsfraction LDSFRACTION fraction of the LDS to use; should be 0 or an integer 2-8. library automatically chooses the value on 0. may be a range or a comma-delimited list. (default 0) --cachesize CACHESIZE size of the cache; should be 0 or a positive integer between one and two times the problem size. library automatically chooses the value on a 0. may be a range or a comma-delimited list. (default 0) --xfactor XFACTOR size of the X dimension to use when dividing up large problems; should be 0 or a power of 2. library automatically chooses the value on a 0. may be a range or a comma-delimited list. (default 0) --library {clfft} indicates the library to use for testing on this run --label LABEL a label to be associated with all transforms performed in this run. if LABEL includes any spaces, it must be in "double quotes". note that the label is not saved to an .ini file. e.g., --label cayman may indicate that a test was performed on a cayman card or --label "Windows 32" may indicate that the test was performed on Windows 32 --createini CREATEINIFILENAME create an .ini file with the given name that saves the other parameters given at the command line, then quit. e.g., 'performance.py -x 2048 --createini my_favorite_setup.ini' will create an .ini file that will save the configuration for a 2048-datapoint 1D FFT. --ini INIFILENAME use the parameters in the named .ini file instead of the command line parameters. --tablefile TABLEOUTPUTFILENAME save the results to a plaintext table with the file name indicated. this can be used with plotPerformance.py to generate graphs of the data (default: table prints to screen)
An example of using this script to gather data is illustrated below; running to gather performance number for a few sizes - 4,16,64,256,1024.
C:\clFFT\src\scripts\perf>measurePerformance.py -x 4,16,64,256,1024 -b max A subdirectory or file perfLog already exists. =========================MEASURE PERFORMANCE START=========================== Process id of Measure Performance:14592 Executing measure performance for label: None Executing for label: None table header---->lengthx,lengthy,lengthz,batch,device,inlay,outlay,place,precision,label,GFLOPS Total combinations = 5 preparing command: 1 Executing Command: ['Client.exe', '--gpu', '-x', '4', '-y', '1', '-z', '1', '--batchSize', '1048576', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10'] stdout: ========================StdDev ( 2 )======================== clFFT[ 0 ]: Pruning 0 samples out of 10 ===========================clFFT============================ Handle: 1 Kernel: 0000000003DD08C0 OutEvents: 000000000480F390 Length: (4) Batch: 1048576 Input Stride: (1) Output Stride: (1) Global Work: (2097152) Gflops: 83.3251 Time (ns): 503,366 stderr: Execution Successfull--------------- preparing command: 2 Executing Command: ['Client.exe', '--gpu', '-x', '16', '-y', '1', '-z', '1', '--batchSize', '262144', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10'] stdout: ========================StdDev ( 2 )======================== clFFT[ 0 ]: Pruning 1 samples out of 10 ===========================clFFT============================ Handle: 1 Kernel: 0000000003DD0940 OutEvents: 000000000627B6B0 Length: (16) Batch: 262144 Input Stride: (1) Output Stride: (1) Global Work: (1048576) Gflops: 174.583 Time (ns): 480,493 stderr: Execution Successfull--------------- preparing command: 3 Executing Command: ['Client.exe', '--gpu', '-x', '64', '-y', '1', '-z', '1', '--batchSize', '65536', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10'] stdout: ========================StdDev ( 2 )======================== clFFT[ 0 ]: Pruning 1 samples out of 10 ===========================clFFT============================ Handle: 1 Kernel: 0000000003DDCA00 OutEvents: 0000000004DBFE50 Length: (64) Batch: 65536 Input Stride: (1) Output Stride: (1) Global Work: (1048576) Gflops: 235.951 Time (ns): 533,284 stderr: Execution Successfull--------------- preparing command: 4 Executing Command: ['Client.exe', '--gpu', '-x', '256', '-y', '1', '-z', '1', '--batchSize', '16384', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10'] stdout: ========================StdDev ( 2 )======================== clFFT[ 0 ]: Pruning 1 samples out of 10 ===========================clFFT============================ Handle: 1 Kernel: 0000000003EDC8D0 OutEvents: 0000000004C18E30 Length: (256) Batch: 16384 Input Stride: (1) Output Stride: (1) Global Work: (1048576) Gflops: 343.413 Time (ns): 488,543 stderr: Execution Successfull--------------- preparing command: 5 Executing Command: ['Client.exe', '--gpu', '-x', '1024', '-y', '1', '-z', '1', '--batchSize', '4096', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10'] stdout: ========================StdDev ( 2 )======================== clFFT[ 0 ]: Pruning 0 samples out of 10 ===========================clFFT============================ Handle: 1 Kernel: 0000000003C508C0 OutEvents: 000000000621C200 Length: (1024) Batch: 4096 Input Stride: (1) Output Stride: (1) Global Work: (524288) Gflops: 420.946 Time (ns): 498,200 stderr: Execution Successfull--------------- =========================MEASURE PERFORMANCE ENDS===========================
This generates a log file in the current directory that contains the details of the parameters tested with the performance number
C:\clFFT\src\scripts\perf>type results2013-07-23T16.01.52.791000.txt lengthx,lengthy,lengthz,batch,device,inlay,outlay,place,precision,label,GFLOPS 4,1,1,1048576,gpu,ci,ci,in,single,None,83.3251 16,1,1,262144,gpu,ci,ci,in,single,None,174.583 64,1,1,65536,gpu,ci,ci,in,single,None,235.951 256,1,1,16384,gpu,ci,ci,in,single,None,343.413 1024,1,1,4096,gpu,ci,ci,in,single,None,420.946
This log file is then fed into the plotPerformance.py script, which consumes the records and plots the results in a graph.
While the logfile generated from measurePerformance is sufficient for gathering performance data, it is nice to be able to generate plots with the data to be able to easily compare and contrast different sets of data. This is the purpose of plotPerformance.py; this python script uses the python matplotlib ( freely available ) library to either open a window into an interactive graph, or create an image file straight to disk. It provides for extensive help information with the --help parameter
C:\clFFT\src\scripts\perf>plotPerformance.py -h usage: plotPerformance.py [-h] -d DATAFILE -x {x,y,z,batchsize,problemsize} [-y {gflops}] [--plot {device,precision,label}] [--title GRAPHTITLE] [--x_axis_label XAXISLABEL] [--x_axis_scale {linear,log2,log10}] [--y_axis_label YAXISLABEL] [--outputfile OUTPUTFILENAME] Plot performance of the clFFT library. plotPerformance.py reads in data tables from measurePerformance.py and plots their values optional arguments: -h, --help show this help message and exit -d DATAFILE, --datafile DATAFILE indicate a file to use as input. must be in the format output by measurePerformance.py. may be used multiple times to indicate multiple input files. e.g., -d cypressOutput.txt -d caymanOutput.txt -x {x,y,z,batchsize,problemsize}, --x_axis {x,y,z,batchsize,problemsize} indicate which value will be represented on the x axis. problemsize is defined as x*y*z*batchsize -y {gflops}, --y_axis {gflops} indicate which value will be represented on the y axis --plot {device,precision,label} indicate which of ['device', 'precision', 'label'] should be used to differentiate multiple plots. this will be chosen automatically if not specified --title GRAPHTITLE the desired title for the graph generated by this execution. if GRAPHTITLE contains any spaces, it must be entered in "double quotes". if this option is not specified, the title will be autogenerated --x_axis_label XAXISLABEL the desired label for the graph's x-axis. if XAXISLABEL contains any spaces, it must be entered in "double quotes". if this option is not specified, the x-axis label will be autogenerated --x_axis_scale {linear,log2,log10} the desired scale for the graph's x-axis. if nothing is specified, it will be selected automatically --y_axis_label YAXISLABEL the desired label for the graph's y-axis. if YAXISLABEL contains any spaces, it must be entered in "double quotes". if this option is not specified, the y-axis label will be autogenerated --outputfile OUTPUTFILENAME name of the file to output graphs. Supported formats: emf, eps, pdf, png, ps, raw, rgba, svg, svgz.
Once the performance of a particular run has been saved to a log file, you can instruct clAmdBlas.plotPerformance to parse the log file and create a line graph from that data. The graph below shows the performance over the data points measured.
C:\clFFT\src\scripts\perf>plotPerformance.py -x x -d results2013-07-23T16.01.52.791000.txt
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4