On 6/10/2016 11:07 AM, Victor Stinner wrote: > I started to work on visualisation. IMHO it helps to understand the problem. > > Let's create a large dataset: 500 samples (100 processes x 5 samples): As I finished by response to Steven, I was thinking you should do something like this to get real data. > --- > $ python3 telco.py --json-file=telco.json -p 100 -n 5 > --- > > Attached plot.py script creates an histogram: > --- > avg: 26.7 ms +- 0.2 ms; min = 26.2 ms > > 26.1 ms: 1 # > 26.2 ms: 12 ##### > 26.3 ms: 34 ############ > 26.4 ms: 44 ################ > 26.5 ms: 109 ###################################### > 26.6 ms: 117 ######################################## > 26.7 ms: 86 ############################## > 26.8 ms: 50 ################## > 26.9 ms: 32 ########### > 27.0 ms: 10 #### > 27.1 ms: 3 ## > 27.2 ms: 1 # > 27.3 ms: 1 # > > minimum 26.1 ms: 0.2% (1) of 500 samples > --- > > Replace "if 1" with "if 0" to produce a graphical view, or just view > the attached distribution.png, the numpy+scipy histogram. > > The distribution looks a gaussian curve: > https://en.wikipedia.org/wiki/Gaussian_function I am not too surprised. If there are several somewhat independent sources of slowdown, their sum would tend to be normal. I am also not surprised that there is also a bit of skewness, but probably not enough to worry about. > The interesting thing is that only 1 sample on 500 are in the minimum > bucket (26.1 ms). If you say that the performance is 26.1 ms, only > 0.2% of your users will be able to reproduce this timing. > > The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. > 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us > 394/500 = 79%. > > IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than > 26.1 ms (0.2%). -- Terry Jan Reedy
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4