In this tutorial, we will introduce how to quantize a pre-trained YOLO11n model using ESP-PPQ and deploy the quantized YOLO11n model using ESP-DL.
Preparationï Model quantizationï Pre-trained ModelïYou can download pre-trained yolo11n model from Ultralytics release.
Currently, ESP-PPQ supports ONNX, PyTorch, and TensorFlow models. During the quantization process, PyTorch and TensorFlow models are first converted to ONNX models, so the pre-trained yolo11n model needs to be converted to an ONNX model.
Specificially, refer to the script export_onnx.py to convert the pre-trained yolo11n model to an ONNX model.
In the srcipt, we have overridden the forward method of the Detect class, which offers following advantages:
Faster inference. Compared to the original yolo11n model, operations related to decoding bounding boxes in Detect head are moved from the inference pass to the post-processing phase, resulting in a significant reduction in inference latency. On one hand, operations like Conv
, Transpose
, Slice
, Split
and Concat
are time-consuming when applied during inference pass. On the other hand, the inference outputs are first filtered using a score threshold before decoding the boxes in the post-processing pass, which significantly reduces the number of calculations, thereby acclerating the overall inference speed.
Lower quantization Error. The Concat
and Add
operators adopt joint quantization in ESP-PPQ. To reduce quantization errors, the box and score are output by separate branches, rather than being concatenated, due to the significant difference in their ranges. Similarly, since the ranges of the two inputs of Add
and Sub
differ significantly, the calculations are performed in the post-processing phase to avoid quantization errors.
The calibration dataset needs to match the input format of the model. The calibration dataset should cover all possible input scenarios to better quantize the model. Here, the calibration dataset used in this example is calib_yolo11n.
8bit default configuration quantizationïQuantization settings
target="esp32p4" num_of_bits=8 batch_size=32 quant_setting = QuantizationSettingFactory.espdl_setting() # default setting
Quantization results
Layer | NOISE:SIGNAL POWER RATIO /model.10/m/m.0/ffn/ffn.1/conv/Conv: | ââââââââââââââââââââ | 36.008% /model.10/m/m.0/attn/proj/conv/Conv: | ââââââââââââââââ | 28.705% /model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | âââââââââââââ | 22.865% /model.23/cv2.2/cv2.2.0/conv/Conv: | ââââââââââââ | 21.718% /model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | ââââââââââââ | 21.624% /model.23/cv2.2/cv2.2.1/conv/Conv: | ââââââââââââ | 21.392% /model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | ââââââââââââ | 21.224% /model.22/m.0/cv2/conv/Conv: | âââââââââââ | 19.763% /model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | âââââââââââ | 19.436% /model.22/m.0/cv3/conv/Conv: | âââââââââââ | 19.378% /model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | ââââââââââ | 18.913% /model.22/m.0/m/m.1/cv2/conv/Conv: | ââââââââââ | 18.645% /model.22/cv2/conv/Conv: | ââââââââââ | 18.628% /model.23/cv2.1/cv2.1.1/conv/Conv: | ââââââââââ | 17.980% /model.8/m.0/cv2/conv/Conv: | âââââââââ | 16.247% /model.23/cv2.0/cv2.0.1/conv/Conv: | âââââââââ | 15.602% /model.10/m/m.0/attn/qkv/conv/Conv: | ââââââââ | 14.666% /model.10/m/m.0/attn/pe/conv/Conv: | ââââââââ | 14.556% /model.23/cv2.1/cv2.1.0/conv/Conv: | ââââââââ | 14.302% /model.22/cv1/conv/Conv: | ââââââââ | 13.921% /model.10/m/m.0/attn/MatMul_1: | ââââââââ | 13.905% /model.10/cv1/conv/Conv: | âââââââ | 13.494% /model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | ââââââ | 11.800% /model.19/m.0/cv2/conv/Conv: | ââââââ | 11.515% /model.22/m.0/m/m.0/cv2/conv/Conv: | ââââââ | 11.286% /model.20/conv/Conv: | ââââââ | 10.930% /model.13/m.0/cv2/conv/Conv: | ââââââ | 10.882% /model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | ââââââ | 10.692% /model.23/cv2.2/cv2.2.2/Conv: | ââââââ | 10.113% /model.10/cv2/conv/Conv: | âââââ | 9.720% /model.8/cv2/conv/Conv: | âââââ | 9.598% /model.8/m.0/cv1/conv/Conv: | âââââ | 9.470% /model.19/cv2/conv/Conv: | âââââ | 9.314% /model.22/m.0/m/m.0/cv1/conv/Conv: | âââââ | 9.068% /model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | âââââ | 9.065% /model.8/cv1/conv/Conv: | âââââ | 9.051% /model.8/m.0/cv3/conv/Conv: | âââââ | 9.044% /model.6/m.0/cv2/conv/Conv: | âââââ | 8.811% /model.22/m.0/m/m.1/cv1/conv/Conv: | âââââ | 8.781% /model.13/cv2/conv/Conv: | âââââ | 8.687% /model.8/m.0/m/m.0/cv1/conv/Conv: | âââââ | 8.503% /model.8/m.0/m/m.0/cv2/conv/Conv: | âââââ | 8.470% /model.19/cv1/conv/Conv: | ââââ | 8.199% /model.10/m/m.0/attn/MatMul: | ââââ | 8.117% /model.8/m.0/m/m.1/cv1/conv/Conv: | ââââ | 7.964% /model.13/cv1/conv/Conv: | ââââ | 7.734% /model.19/m.0/cv1/conv/Conv: | ââââ | 7.661% /model.22/m.0/cv1/conv/Conv: | ââââ | 7.490% /model.13/m.0/cv1/conv/Conv: | ââââ | 7.162% /model.8/m.0/m/m.1/cv2/conv/Conv: | ââââ | 7.145% /model.23/cv2.0/cv2.0.0/conv/Conv: | ââââ | 7.041% /model.23/cv2.1/cv2.1.2/Conv: | ââââ | 6.917% /model.23/cv2.0/cv2.0.2/Conv: | ââââ | 6.778% /model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | ââââ | 6.641% /model.17/conv/Conv: | âââ | 6.125% /model.16/m.0/cv2/conv/Conv: | âââ | 5.937% /model.6/cv2/conv/Conv: | âââ | 5.838% /model.6/m.0/cv3/conv/Conv: | âââ | 5.832% /model.6/cv1/conv/Conv: | âââ | 5.688% /model.7/conv/Conv: | âââ | 5.612% /model.9/cv2/conv/Conv: | âââ | 5.367% /model.10/m/m.0/ffn/ffn.0/conv/Conv: | âââ | 5.158% /model.6/m.0/m/m.0/cv1/conv/Conv: | âââ | 5.143% /model.16/m.0/cv1/conv/Conv: | âââ | 5.137% /model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | âââ | 5.087% /model.16/cv2/conv/Conv: | âââ | 4.989% /model.2/cv2/conv/Conv: | ââ | 4.547% /model.6/m.0/m/m.0/cv2/conv/Conv: | ââ | 4.441% /model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | ââ | 4.343% /model.3/conv/Conv: | ââ | 4.304% /model.6/m.0/m/m.1/cv1/conv/Conv: | ââ | 4.006% /model.5/conv/Conv: | ââ | 3.932% /model.6/m.0/cv1/conv/Conv: | ââ | 3.837% /model.4/cv1/conv/Conv: | ââ | 3.687% /model.2/cv1/conv/Conv: | ââ | 3.565% /model.4/cv2/conv/Conv: | ââ | 3.559% /model.16/cv1/conv/Conv: | ââ | 3.107% /model.2/m.0/cv2/conv/Conv: | ââ | 2.882% /model.6/m.0/m/m.1/cv2/conv/Conv: | â | 2.758% /model.4/m.0/cv1/conv/Conv: | â | 2.564% /model.9/cv1/conv/Conv: | â | 2.017% /model.4/m.0/cv2/conv/Conv: | â | 1.785% /model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | â | 1.327% /model.1/conv/Conv: | â | 1.313% /model.23/cv3.2/cv3.2.2/Conv: | â | 1.155% /model.2/m.0/cv1/conv/Conv: | | 0.727% /model.23/cv3.1/cv3.1.2/Conv: | | 0.493% /model.23/cv3.0/cv3.0.2/Conv: | | 0.282% /model.0/conv/Conv: | | 0.159% Analysing Layerwise quantization error:: 100%|ââââââââââ| 89/89 [03:39<00:00, 2.46s/it] Layer | NOISE:SIGNAL POWER RATIO /model.1/conv/Conv: | ââââââââââââââââââââ | 0.384% /model.22/cv1/conv/Conv: | âââââââââââââ | 0.247% /model.4/cv2/conv/Conv: | ââââââââââââ | 0.233% /model.2/cv2/conv/Conv: | ââââââââââ | 0.201% /model.0/conv/Conv: | ââââââââââ | 0.192% /model.9/cv2/conv/Conv: | ââââââââ | 0.156% /model.10/cv1/conv/Conv: | âââââââ | 0.132% /model.3/conv/Conv: | ââââââ | 0.108% /model.4/cv1/conv/Conv: | ââââ | 0.074% /model.16/cv1/conv/Conv: | âââ | 0.066% /model.2/cv1/conv/Conv: | âââ | 0.060% /model.23/cv2.0/cv2.0.0/conv/Conv: | âââ | 0.052% /model.2/m.0/cv1/conv/Conv: | ââ | 0.044% /model.6/cv1/conv/Conv: | ââ | 0.033% /model.10/m/m.0/attn/pe/conv/Conv: | ââ | 0.029% /model.2/m.0/cv2/conv/Conv: | â | 0.028% /model.22/m.0/m/m.0/cv1/conv/Conv: | â | 0.023% /model.16/cv2/conv/Conv: | â | 0.021% /model.16/m.0/cv2/conv/Conv: | â | 0.020% /model.19/m.0/cv1/conv/Conv: | â | 0.020% /model.4/m.0/cv1/conv/Conv: | â | 0.018% /model.19/cv2/conv/Conv: | â | 0.017% /model.4/m.0/cv2/conv/Conv: | â | 0.016% /model.10/m/m.0/attn/qkv/conv/Conv: | â | 0.016% /model.19/cv1/conv/Conv: | â | 0.015% /model.13/cv2/conv/Conv: | â | 0.015% /model.8/cv1/conv/Conv: | â | 0.013% /model.23/cv2.1/cv2.1.0/conv/Conv: | â | 0.013% /model.23/cv2.2/cv2.2.1/conv/Conv: | â | 0.012% /model.13/cv1/conv/Conv: | â | 0.012% /model.10/cv2/conv/Conv: | â | 0.011% /model.13/m.0/cv1/conv/Conv: | â | 0.011% /model.6/cv2/conv/Conv: | â | 0.011% /model.13/m.0/cv2/conv/Conv: | â | 0.010% /model.5/conv/Conv: | | 0.010% /model.19/m.0/cv2/conv/Conv: | | 0.009% /model.6/m.0/m/m.1/cv1/conv/Conv: | | 0.009% /model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | | 0.008% /model.23/cv2.2/cv2.2.0/conv/Conv: | | 0.008% /model.23/cv2.1/cv2.1.1/conv/Conv: | | 0.008% /model.9/cv1/conv/Conv: | | 0.008% /model.23/cv2.0/cv2.0.1/conv/Conv: | | 0.007% /model.16/m.0/cv1/conv/Conv: | | 0.007% /model.17/conv/Conv: | | 0.007% /model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | | 0.007% /model.10/m/m.0/ffn/ffn.1/conv/Conv: | | 0.007% /model.23/cv2.0/cv2.0.2/Conv: | | 0.006% /model.8/m.0/cv1/conv/Conv: | | 0.006% /model.23/cv2.2/cv2.2.2/Conv: | | 0.005% /model.23/cv2.1/cv2.1.2/Conv: | | 0.005% /model.22/m.0/cv3/conv/Conv: | | 0.005% /model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | | 0.005% /model.7/conv/Conv: | | 0.005% /model.8/cv2/conv/Conv: | | 0.004% /model.22/cv2/conv/Conv: | | 0.004% /model.6/m.0/cv3/conv/Conv: | | 0.004% /model.10/m/m.0/ffn/ffn.0/conv/Conv: | | 0.004% /model.8/m.0/m/m.1/cv2/conv/Conv: | | 0.004% /model.22/m.0/m/m.1/cv1/conv/Conv: | | 0.004% /model.8/m.0/m/m.1/cv1/conv/Conv: | | 0.004% /model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | | 0.003% /model.10/m/m.0/attn/proj/conv/Conv: | | 0.003% /model.22/m.0/m/m.0/cv2/conv/Conv: | | 0.003% /model.22/m.0/cv1/conv/Conv: | | 0.003% /model.8/m.0/cv3/conv/Conv: | | 0.003% /model.6/m.0/m/m.0/cv1/conv/Conv: | | 0.003% /model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | | 0.003% /model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | | 0.002% /model.6/m.0/m/m.1/cv2/conv/Conv: | | 0.002% /model.8/m.0/m/m.0/cv2/conv/Conv: | | 0.002% /model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | | 0.002% /model.10/m/m.0/attn/MatMul_1: | | 0.002% /model.22/m.0/m/m.1/cv2/conv/Conv: | | 0.001% /model.6/m.0/m/m.0/cv2/conv/Conv: | | 0.001% /model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | | 0.001% /model.8/m.0/m/m.0/cv1/conv/Conv: | | 0.001% /model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | | 0.001% /model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | | 0.001% /model.6/m.0/cv1/conv/Conv: | | 0.001% /model.23/cv3.2/cv3.2.2/Conv: | | 0.001% /model.20/conv/Conv: | | 0.001% /model.23/cv3.1/cv3.1.2/Conv: | | 0.001% /model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | | 0.001% /model.6/m.0/cv2/conv/Conv: | | 0.001% /model.23/cv3.0/cv3.0.2/Conv: | | 0.000% /model.10/m/m.0/attn/MatMul: | | 0.000% /model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | | 0.000% /model.8/m.0/cv2/conv/Conv: | | 0.000% /model.22/m.0/cv2/conv/Conv: | | 0.000%
Quantization error analysis
With the same inputs, The mAP50:95 on COCO val2017 after quantization is only 30.7%, which is lower than that of the float model. There is a accuracy loss with:
Graphwise Error
The output layers of the model are /model.23/cv3.2/cv3.2.2/Conv, /model.23/cv2.2/cv2.2.2/Conv, /model.23/cv3.1/cv3.1.2/Conv, /model.23/cv2.1/cv2.1.2/Conv, /model.23/cv3.0/cv3.0.2/Conv and /model.23/cv2.0/cv2.0.2/Conv. The cumulative error for these layers are 1.155%, 10.113%, 0.493%, 6.917%, 0.282% and 6.778% respectively. Generally, if the cumulative error of the output layer is less than 10%, the loss in accuracy of the quantized model is minimal.
Layerwise error
Observing the Layerwise error, it is found that the errors for all layers are below 1%, indicating that the quantization errors for all layers are small.
We noticed that although the layer-wise errors for all layers are small, the cumulative errors in some layers are relatively large. This may be related to the complex CSP structure in the yolo11n model, where the inputs to the Concat
or Add
layers may have different distributions or scales. We can choose to quantize certain layers using int16 and optimize the quantization with horizontal layer split pass. For more details, please refer to the mixed-precision + horizontal layer split pass quantization test.
Spliting convolution layers or GEMM layers can reduce quantization error for better performance.
Quantization settings
from esp_ppq.api import get_target_platform target="esp32p4" num_of_bits=8 batch_size=32 # Quantize the following layers with 16-bits quant_setting = QuantizationSettingFactory.espdl_setting() quant_setting.dispatching_table.append("/model.2/cv2/conv/Conv", get_target_platform(TARGET, 16)) quant_setting.dispatching_table.append("/model.3/conv/Conv", get_target_platform(TARGET, 16)) quant_setting.dispatching_table.append("/model.4/cv2/conv/Conv", get_target_platform(TARGET, 16)) # Horizontal Layer Split Pass quant_setting.weight_split = True quant_setting.weight_split_setting.method = 'balance' quant_setting.weight_split_setting.value_threshold = 1.5 quant_setting.weight_split_setting.interested_layers = ['/model.0/conv/Conv', '/model.1/conv/Conv']
Quantization results
Layer | NOISE:SIGNAL POWER RATIO /model.10/m/m.0/ffn/ffn.1/conv/Conv: | ââââââââââââââââââââ | 24.835% /model.10/m/m.0/attn/proj/conv/Conv: | âââââââââââââââ | 18.632% /model.23/cv2.2/cv2.2.1/conv/Conv: | ââââââââââââââ | 17.908% /model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | ââââââââââââââ | 16.922% /model.23/cv2.2/cv2.2.0/conv/Conv: | âââââââââââââ | 16.754% /model.22/m.0/cv3/conv/Conv: | ââââââââââââ | 15.404% /model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | ââââââââââââ | 15.042% /model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | ââââââââââââ | 14.948% /model.22/m.0/m/m.1/cv2/conv/Conv: | ââââââââââââ | 14.702% /model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | âââââââââââ | 13.683% /model.22/cv2/conv/Conv: | âââââââââââ | 13.654% /model.22/m.0/cv2/conv/Conv: | âââââââââââ | 13.514% /model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | ââââââââââ | 12.885% /model.23/cv2.1/cv2.1.1/conv/Conv: | âââââââââ | 10.865% /model.23/cv2.0/cv2.0.1/conv/Conv: | ââââââââ | 9.875% /model.23/cv2.1/cv2.1.0/conv/Conv: | ââââââââ | 9.658% /model.22/cv1/conv/Conv: | âââââââ | 8.917% /model.10/m/m.0/attn/MatMul_1: | âââââââ | 8.368% /model.23/cv2.2/cv2.2.2/Conv: | âââââââ | 8.156% /model.22/m.0/m/m.0/cv2/conv/Conv: | ââââââ | 8.056% /model.10/m/m.0/attn/qkv/conv/Conv: | ââââââ | 7.948% /model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | ââââââ | 7.824% /model.13/m.0/cv2/conv/Conv: | ââââââ | 7.504% /model.19/m.0/cv2/conv/Conv: | ââââââ | 7.290% /model.20/conv/Conv: | ââââââ | 6.986% /model.10/m/m.0/attn/pe/conv/Conv: | ââââââ | 6.926% /model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | âââââ | 6.771% /model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | âââââ | 6.756% /model.22/m.0/m/m.1/cv1/conv/Conv: | âââââ | 6.465% /model.22/m.0/m/m.0/cv1/conv/Conv: | âââââ | 6.274% /model.19/cv2/conv/Conv: | âââââ | 6.116% /model.10/cv1/conv/Conv: | âââââ | 5.868% /model.13/cv2/conv/Conv: | âââââ | 5.815% /model.10/cv2/conv/Conv: | ââââ | 5.664% /model.19/cv1/conv/Conv: | ââââ | 5.178% /model.8/m.0/cv2/conv/Conv: | ââââ | 4.970% /model.19/m.0/cv1/conv/Conv: | ââââ | 4.919% /model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | ââââ | 4.864% /model.22/m.0/cv1/conv/Conv: | ââââ | 4.844% /model.10/m/m.0/attn/MatMul: | ââââ | 4.650% /model.13/cv1/conv/Conv: | ââââ | 4.564% /model.23/cv2.0/cv2.0.0/conv/Conv: | âââ | 4.389% /model.13/m.0/cv1/conv/Conv: | âââ | 4.243% /model.23/cv2.0/cv2.0.2/Conv: | âââ | 4.232% /model.23/cv2.1/cv2.1.2/Conv: | âââ | 4.222% /model.6/m.0/cv2/conv/Conv: | âââ | 4.023% /model.17/conv/Conv: | âââ | 3.754% /model.16/m.0/cv2/conv/Conv: | âââ | 3.511% /model.8/m.0/cv1/conv/Conv: | âââ | 3.277% /model.16/m.0/cv1/conv/Conv: | ââ | 3.158% /model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | ââ | 3.155% /model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | ââ | 3.152% /model.8/cv2/conv/Conv: | ââ | 3.119% /model.8/m.0/m/m.1/cv1/conv/Conv: | ââ | 3.106% /model.8/m.0/cv3/conv/Conv: | ââ | 3.083% /model.6/m.0/cv3/conv/Conv: | ââ | 3.068% /model.8/cv1/conv/Conv: | ââ | 3.035% /model.16/cv2/conv/Conv: | ââ | 3.002% /model.2/cv2/conv/Conv: | ââ | 2.992% /model.8/m.0/m/m.0/cv2/conv/Conv: | ââ | 2.971% /model.6/cv1/conv/Conv: | ââ | 2.819% /model.8/m.0/m/m.0/cv1/conv/Conv: | ââ | 2.809% /model.10/m/m.0/ffn/ffn.0/conv/Conv: | ââ | 2.760% /model.2/cv1/conv/Conv: | ââ | 2.683% /model.6/cv2/conv/Conv: | ââ | 2.630% /model.8/m.0/m/m.1/cv2/conv/Conv: | ââ | 2.615% /model.9/cv2/conv/Conv: | ââ | 2.540% /model.3/conv/Conv: | ââ | 2.503% /model.2/m.0/cv2/conv/Conv: | ââ | 2.474% /model.6/m.0/m/m.0/cv1/conv/Conv: | ââ | 2.273% /model.6/m.0/m/m.0/cv2/conv/Conv: | ââ | 2.246% /model.4/cv2/conv/Conv: | ââ | 2.141% /model.7/conv/Conv: | ââ | 2.120% /model.6/m.0/m/m.1/cv1/conv/Conv: | ââ | 2.069% /model.5/conv/Conv: | ââ | 2.015% /model.16/cv1/conv/Conv: | â | 1.894% /model.4/cv1/conv/Conv: | â | 1.793% /model.4/m.0/cv1/conv/Conv: | â | 1.776% /model.6/m.0/cv1/conv/Conv: | â | 1.731% /model.6/m.0/m/m.1/cv2/conv/Conv: | â | 1.550% /model.4/m.0/cv2/conv/Conv: | â | 1.257% /model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | â | 0.886% /model.1/conv/Conv: | â | 0.775% /model.23/cv3.2/cv3.2.2/Conv: | â | 0.771% PPQ_Operation_2: | | 0.696% /model.9/cv1/conv/Conv: | | 0.695% /model.2/m.0/cv1/conv/Conv: | | 0.534% /model.23/cv3.1/cv3.1.2/Conv: | | 0.339% /model.23/cv3.0/cv3.0.2/Conv: | | 0.190% PPQ_Operation_0: | | 0.110% /model.0/conv/Conv: | | 0.099% Analysing Layerwise quantization error:: 100%|ââââââââââ| 91/91 [04:13<00:00, 2.79s/it] Layer | NOISE:SIGNAL POWER RATIO /model.22/cv1/conv/Conv: | ââââââââââââââââââââ | 0.244% /model.9/cv2/conv/Conv: | âââââââââââââ | 0.156% /model.10/cv1/conv/Conv: | âââââââââââ | 0.132% /model.1/conv/Conv: | ââââââ | 0.077% /model.4/cv1/conv/Conv: | ââââââ | 0.074% /model.16/cv1/conv/Conv: | âââââ | 0.066% /model.0/conv/Conv: | âââââ | 0.061% /model.2/cv1/conv/Conv: | âââââ | 0.060% /model.23/cv2.0/cv2.0.0/conv/Conv: | ââââ | 0.052% PPQ_Operation_0: | ââââ | 0.047% /model.2/m.0/cv1/conv/Conv: | ââââ | 0.045% /model.10/m/m.0/attn/pe/conv/Conv: | ââ | 0.029% /model.2/m.0/cv2/conv/Conv: | ââ | 0.029% /model.10/m/m.0/attn/MatMul: | ââ | 0.025% /model.6/cv1/conv/Conv: | ââ | 0.025% /model.22/m.0/m/m.0/cv1/conv/Conv: | ââ | 0.023% /model.16/cv2/conv/Conv: | ââ | 0.021% /model.16/m.0/cv2/conv/Conv: | ââ | 0.020% /model.19/m.0/cv1/conv/Conv: | ââ | 0.020% /model.4/m.0/cv1/conv/Conv: | â | 0.018% /model.19/cv2/conv/Conv: | â | 0.017% /model.4/m.0/cv2/conv/Conv: | â | 0.016% /model.10/m/m.0/attn/qkv/conv/Conv: | â | 0.016% /model.19/cv1/conv/Conv: | â | 0.015% /model.13/cv2/conv/Conv: | â | 0.015% /model.23/cv2.1/cv2.1.0/conv/Conv: | â | 0.013% /model.23/cv2.2/cv2.2.1/conv/Conv: | â | 0.012% /model.13/cv1/conv/Conv: | â | 0.012% /model.6/cv2/conv/Conv: | â | 0.011% /model.13/m.0/cv1/conv/Conv: | â | 0.011% /model.8/cv1/conv/Conv: | â | 0.010% /model.13/m.0/cv2/conv/Conv: | â | 0.010% /model.5/conv/Conv: | â | 0.010% /model.6/m.0/m/m.1/cv1/conv/Conv: | â | 0.009% /model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | â | 0.008% /model.23/cv2.2/cv2.2.0/conv/Conv: | â | 0.008% /model.23/cv2.1/cv2.1.1/conv/Conv: | â | 0.008% /model.19/m.0/cv2/conv/Conv: | â | 0.008% /model.8/cv2/conv/Conv: | â | 0.008% /model.9/cv1/conv/Conv: | â | 0.008% /model.23/cv2.0/cv2.0.1/conv/Conv: | â | 0.007% /model.16/m.0/cv1/conv/Conv: | â | 0.007% /model.17/conv/Conv: | â | 0.007% /model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | â | 0.007% /model.10/m/m.0/ffn/ffn.1/conv/Conv: | â | 0.007% /model.22/m.0/cv1/conv/Conv: | | 0.006% /model.10/cv2/conv/Conv: | | 0.006% /model.23/cv2.0/cv2.0.2/Conv: | | 0.006% /model.23/cv2.2/cv2.2.2/Conv: | | 0.005% /model.23/cv2.1/cv2.1.2/Conv: | | 0.005% /model.22/m.0/cv3/conv/Conv: | | 0.005% /model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | | 0.005% /model.22/cv2/conv/Conv: | | 0.005% /model.7/conv/Conv: | | 0.004% /model.6/m.0/cv3/conv/Conv: | | 0.004% /model.10/m/m.0/ffn/ffn.0/conv/Conv: | | 0.004% /model.8/m.0/m/m.1/cv2/conv/Conv: | | 0.004% /model.22/m.0/m/m.1/cv1/conv/Conv: | | 0.004% /model.8/m.0/m/m.1/cv1/conv/Conv: | | 0.004% /model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | | 0.003% /model.8/m.0/cv1/conv/Conv: | | 0.003% /model.10/m/m.0/attn/proj/conv/Conv: | | 0.003% /model.22/m.0/m/m.0/cv2/conv/Conv: | | 0.003% PPQ_Operation_2: | | 0.003% /model.8/m.0/cv3/conv/Conv: | | 0.003% /model.6/m.0/m/m.0/cv1/conv/Conv: | | 0.003% /model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | | 0.002% /model.6/m.0/m/m.1/cv2/conv/Conv: | | 0.002% /model.8/m.0/m/m.0/cv2/conv/Conv: | | 0.002% /model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | | 0.002% /model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | | 0.002% /model.10/m/m.0/attn/MatMul_1: | | 0.002% /model.22/m.0/m/m.1/cv2/conv/Conv: | | 0.001% /model.6/m.0/m/m.0/cv2/conv/Conv: | | 0.001% /model.8/m.0/m/m.0/cv1/conv/Conv: | | 0.001% /model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | | 0.001% /model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | | 0.001% /model.2/cv2/conv/Conv: | | 0.001% /model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | | 0.001% /model.6/m.0/cv1/conv/Conv: | | 0.001% /model.23/cv3.2/cv3.2.2/Conv: | | 0.001% /model.20/conv/Conv: | | 0.001% /model.23/cv3.1/cv3.1.2/Conv: | | 0.001% /model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | | 0.001% /model.6/m.0/cv2/conv/Conv: | | 0.001% /model.23/cv3.0/cv3.0.2/Conv: | | 0.000% /model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | | 0.000% /model.8/m.0/cv2/conv/Conv: | | 0.000% /model.22/m.0/cv2/conv/Conv: | | 0.000% /model.3/conv/Conv: | | 0.000% /model.4/cv2/conv/Conv: | | 0.000%
Quantization error analysis
After using 16-bits quantization on layers with higher layer-wise error and employing horizontal layer split pass, the quantized modelâs mAP50:95 on COCO val2017 improves to 33.4% with the same inputs. Additionally, a noticeable decrease in cumulative error of output layers can be observed.
The graphwise error for the output layers of the model, /model.23/cv3.2/cv3.2.2/Conv, /model.23/cv2.2/cv2.2.2/Conv, /model.23/cv3.1/cv3.1.2/Conv, /model.23/cv2.1/cv2.1.2/Conv, /model.23/cv3.0/cv3.0.2/Conv and /model.23/cv2.0/cv2.0.2/Conv, are 0.771%, 8.156%, 0.339%, 4.222%, 0.190% and 4.232% respectively.
Quantization-Aware TrainingïTo further improve the accuracy of the quantized model, we adopt the quantization-aware training(QAT) strategy. Here, QAT is performed based on 8-bit quantization.
Quantization settings
Quantization results
Layer | NOISE:SIGNAL POWER RATIO /model.10/m/m.0/ffn/ffn.1/conv/Conv: | ââââââââââââââââââââ | 29.837% /model.10/m/m.0/attn/proj/conv/Conv: | ââââââââââââââââ | 23.397% /model.10/m/m.0/attn/pe/conv/Conv: | ââââââââââ | 15.253% /model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | ââââââââââ | 14.819% /model.10/m/m.0/attn/MatMul_1: | ââââââââââ | 14.725% /model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | ââââââââââ | 14.315% /model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | âââââââââ | 14.212% /model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | âââââââââ | 14.187% /model.10/m/m.0/attn/qkv/conv/Conv: | âââââââââ | 13.797% /model.23/cv2.2/cv2.2.0/conv/Conv: | âââââââââ | 13.721% /model.22/m.0/cv2/conv/Conv: | âââââââââ | 13.540% /model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | âââââââââ | 13.408% /model.8/m.0/cv2/conv/Conv: | âââââââââ | 12.809% /model.22/m.0/cv3/conv/Conv: | ââââââââ | 12.623% /model.23/cv2.1/cv2.1.1/conv/Conv: | ââââââââ | 12.472% /model.23/cv2.1/cv2.1.0/conv/Conv: | ââââââââ | 12.177% /model.22/m.0/m/m.1/cv2/conv/Conv: | ââââââââ | 11.719% /model.23/cv2.2/cv2.2.1/conv/Conv: | ââââââââ | 11.711% /model.10/cv1/conv/Conv: | ââââââââ | 11.589% /model.22/cv2/conv/Conv: | ââââââââ | 11.551% /model.23/cv2.0/cv2.0.1/conv/Conv: | ââââââââ | 11.505% /model.10/m/m.0/attn/MatMul: | ââââââââ | 11.346% /model.22/cv1/conv/Conv: | âââââââ | 10.201% /model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | ââââââ | 9.710% /model.13/m.0/cv2/conv/Conv: | ââââââ | 9.538% /model.20/conv/Conv: | ââââââ | 8.870% /model.19/m.0/cv2/conv/Conv: | ââââââ | 8.713% /model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | âââââ | 8.157% /model.22/m.0/m/m.0/cv2/conv/Conv: | âââââ | 8.005% /model.8/cv2/conv/Conv: | âââââ | 7.952% /model.8/m.0/cv1/conv/Conv: | âââââ | 7.697% /model.13/cv2/conv/Conv: | âââââ | 7.557% /model.19/cv2/conv/Conv: | âââââ | 7.443% /model.10/cv2/conv/Conv: | âââââ | 7.403% /model.6/m.0/cv2/conv/Conv: | âââââ | 7.099% /model.8/cv1/conv/Conv: | âââââ | 6.996% /model.19/cv1/conv/Conv: | âââââ | 6.912% /model.8/m.0/m/m.0/cv1/conv/Conv: | âââââ | 6.908% /model.8/m.0/cv3/conv/Conv: | ââââ | 6.755% /model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | ââââ | 6.746% /model.8/m.0/m/m.0/cv2/conv/Conv: | ââââ | 6.743% /model.8/m.0/m/m.1/cv1/conv/Conv: | ââââ | 6.638% /model.13/cv1/conv/Conv: | ââââ | 6.361% /model.2/m.0/cv2/conv/Conv: | ââââ | 6.274% /model.13/m.0/cv1/conv/Conv: | ââââ | 6.261% /model.19/m.0/cv1/conv/Conv: | ââââ | 6.191% /model.22/m.0/m/m.0/cv1/conv/Conv: | ââââ | 6.036% /model.23/cv2.2/cv2.2.2/Conv: | ââââ | 5.999% /model.22/m.0/m/m.1/cv1/conv/Conv: | ââââ | 5.899% /model.23/cv2.0/cv2.0.0/conv/Conv: | ââââ | 5.618% /model.8/m.0/m/m.1/cv2/conv/Conv: | ââââ | 5.560% /model.22/m.0/cv1/conv/Conv: | âââ | 5.336% /model.16/m.0/cv2/conv/Conv: | âââ | 5.316% /model.17/conv/Conv: | âââ | 5.113% /model.6/m.0/cv3/conv/Conv: | âââ | 5.103% /model.16/m.0/cv1/conv/Conv: | âââ | 5.101% /model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | âââ | 5.052% /model.2/cv2/conv/Conv: | âââ | 5.003% /model.6/cv2/conv/Conv: | âââ | 4.968% /model.6/cv1/conv/Conv: | âââ | 4.792% /model.23/cv2.1/cv2.1.2/Conv: | âââ | 4.543% /model.7/conv/Conv: | âââ | 4.520% /model.3/conv/Conv: | âââ | 4.362% /model.16/cv2/conv/Conv: | âââ | 4.028% /model.23/cv2.0/cv2.0.2/Conv: | âââ | 4.001% /model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | âââ | 3.954% /model.9/cv2/conv/Conv: | âââ | 3.901% /model.6/m.0/m/m.0/cv1/conv/Conv: | âââ | 3.891% /model.10/m/m.0/ffn/ffn.0/conv/Conv: | ââ | 3.791% /model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | ââ | 3.711% /model.4/cv1/conv/Conv: | ââ | 3.673% /model.6/m.0/m/m.0/cv2/conv/Conv: | ââ | 3.620% /model.6/m.0/m/m.1/cv1/conv/Conv: | ââ | 3.513% /model.4/cv2/conv/Conv: | ââ | 3.421% /model.5/conv/Conv: | ââ | 3.320% /model.6/m.0/cv1/conv/Conv: | ââ | 3.073% /model.2/cv1/conv/Conv: | ââ | 3.021% /model.16/cv1/conv/Conv: | ââ | 2.764% /model.6/m.0/m/m.1/cv2/conv/Conv: | ââ | 2.454% /model.4/m.0/cv1/conv/Conv: | ââ | 2.408% /model.4/m.0/cv2/conv/Conv: | â | 1.689% /model.2/m.0/cv1/conv/Conv: | â | 1.602% /model.9/cv1/conv/Conv: | â | 1.568% /model.1/conv/Conv: | â | 1.205% /model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | â | 1.091% /model.23/cv3.2/cv3.2.2/Conv: | | 0.746% /model.23/cv3.1/cv3.1.2/Conv: | | 0.480% /model.23/cv3.0/cv3.0.2/Conv: | | 0.386% /model.0/conv/Conv: | | 0.163% Analysing Layerwise quantization error:: 100%|ââââââââââ| 89/89 [04:01<00:00, 2.72s/it] Layer | NOISE:SIGNAL POWER RATIO /model.2/cv2/conv/Conv: | ââââââââââââââââââââ | 0.935% /model.9/cv2/conv/Conv: | ââââââââââââââââââ | 0.826% /model.2/m.0/cv1/conv/Conv: | âââââââââââââââ | 0.698% /model.3/conv/Conv: | âââââââââââââ | 0.611% /model.4/cv2/conv/Conv: | ââââââââââ | 0.491% /model.10/cv2/conv/Conv: | âââââââââ | 0.408% /model.23/cv2.2/cv2.2.2/Conv: | ââââââ | 0.283% /model.2/cv1/conv/Conv: | ââââââ | 0.261% /model.4/cv1/conv/Conv: | âââââ | 0.249% /model.1/conv/Conv: | âââââ | 0.217% /model.22/cv1/conv/Conv: | ââââ | 0.201% /model.10/cv1/conv/Conv: | âââ | 0.143% /model.5/conv/Conv: | âââ | 0.136% /model.16/cv1/conv/Conv: | âââ | 0.128% /model.10/m/m.0/attn/pe/conv/Conv: | âââ | 0.120% /model.0/conv/Conv: | âââ | 0.118% /model.16/m.0/cv1/conv/Conv: | ââ | 0.105% /model.16/cv2/conv/Conv: | ââ | 0.094% /model.16/m.0/cv2/conv/Conv: | ââ | 0.092% /model.23/cv2.0/cv2.0.0/conv/Conv: | ââ | 0.089% /model.4/m.0/cv1/conv/Conv: | ââ | 0.071% /model.22/m.0/cv1/conv/Conv: | â | 0.067% /model.19/cv2/conv/Conv: | â | 0.063% /model.6/cv2/conv/Conv: | â | 0.061% /model.4/m.0/cv2/conv/Conv: | â | 0.059% /model.17/conv/Conv: | â | 0.054% /model.13/cv2/conv/Conv: | â | 0.053% /model.8/m.0/cv3/conv/Conv: | â | 0.051% /model.6/cv1/conv/Conv: | â | 0.047% /model.23/cv2.2/cv2.2.0/conv/Conv: | â | 0.042% /model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | â | 0.041% /model.13/cv1/conv/Conv: | â | 0.040% /model.7/conv/Conv: | â | 0.038% /model.10/m/m.0/attn/qkv/conv/Conv: | â | 0.038% /model.13/m.0/cv1/conv/Conv: | â | 0.033% /model.23/cv2.1/cv2.1.0/conv/Conv: | â | 0.031% /model.6/m.0/m/m.1/cv1/conv/Conv: | â | 0.028% /model.19/m.0/cv2/conv/Conv: | â | 0.027% /model.8/m.0/m/m.1/cv1/conv/Conv: | â | 0.026% /model.2/m.0/cv2/conv/Conv: | â | 0.026% /model.19/m.0/cv1/conv/Conv: | | 0.022% /model.6/m.0/cv3/conv/Conv: | | 0.021% /model.19/cv1/conv/Conv: | | 0.021% /model.9/cv1/conv/Conv: | | 0.016% /model.22/m.0/m/m.1/cv1/conv/Conv: | | 0.016% /model.13/m.0/cv2/conv/Conv: | | 0.015% /model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | | 0.015% /model.22/m.0/m/m.0/cv1/conv/Conv: | | 0.014% /model.8/cv1/conv/Conv: | | 0.013% /model.23/cv2.0/cv2.0.2/Conv: | | 0.013% /model.23/cv2.2/cv2.2.1/conv/Conv: | | 0.012% /model.10/m/m.0/ffn/ffn.0/conv/Conv: | | 0.011% /model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | | 0.011% /model.8/cv2/conv/Conv: | | 0.011% /model.23/cv2.1/cv2.1.2/Conv: | | 0.010% /model.22/m.0/cv3/conv/Conv: | | 0.010% /model.23/cv2.1/cv2.1.1/conv/Conv: | | 0.008% /model.10/m/m.0/ffn/ffn.1/conv/Conv: | | 0.008% /model.23/cv2.0/cv2.0.1/conv/Conv: | | 0.007% /model.10/m/m.0/attn/proj/conv/Conv: | | 0.007% /model.8/m.0/cv1/conv/Conv: | | 0.007% /model.22/m.0/m/m.0/cv2/conv/Conv: | | 0.006% /model.8/m.0/m/m.1/cv2/conv/Conv: | | 0.005% /model.22/cv2/conv/Conv: | | 0.005% /model.20/conv/Conv: | | 0.005% /model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | | 0.005% /model.6/m.0/m/m.0/cv1/conv/Conv: | | 0.005% /model.8/m.0/m/m.0/cv1/conv/Conv: | | 0.004% /model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | | 0.003% /model.8/m.0/m/m.0/cv2/conv/Conv: | | 0.003% /model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | | 0.003% /model.6/m.0/cv1/conv/Conv: | | 0.003% /model.23/cv3.2/cv3.2.2/Conv: | | 0.003% /model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | | 0.003% /model.6/m.0/m/m.1/cv2/conv/Conv: | | 0.003% /model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | | 0.002% /model.22/m.0/m/m.1/cv2/conv/Conv: | | 0.002% /model.6/m.0/m/m.0/cv2/conv/Conv: | | 0.002% /model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | | 0.002% /model.10/m/m.0/attn/MatMul_1: | | 0.002% /model.23/cv3.0/cv3.0.2/Conv: | | 0.001% /model.23/cv3.1/cv3.1.2/Conv: | | 0.001% /model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | | 0.001% /model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | | 0.001% /model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | | 0.001% /model.6/m.0/cv2/conv/Conv: | | 0.000% /model.10/m/m.0/attn/MatMul: | | 0.000% /model.8/m.0/cv2/conv/Conv: | | 0.000% /model.22/m.0/cv2/conv/Conv: | | 0.000%
Quantization error analysis
After applying QAT to 8-bit quantization, the quantized modelâs mAP50:95 on COCO val2017 improves to 36.0% with the same inputs, while cumulative errors of out layers are significantly reduced. Compared to the other two quantization methods, the 8-bit QAT quantized model achieves the highest quantization accuracy with the lowest inference latency.
The graphwise error for the output layers of the model, /model.23/cv3.2/cv3.2.2/Conv, /model.23/cv2.2/cv2.2.2/Conv, /model.23/cv3.1/cv3.1.2/Conv, /model.23/cv2.1/cv2.1.2/Conv, /model.23/cv3.0/cv3.0.2/Conv and /model.23/cv2.0/cv2.0.2/Conv, are 0.746%, 5.999%, 0.480%, 4.543%, 0.386% and 4.001% respectively.
Note
If the model inference speed is a higher priority and a certain degree of accuracy loss is acceptable, you may consider quantizing the model with an input size of 320x320 for the YOLO11N model. The model inference speed of different input resolutions can be found in README.md .
Model deploymentï Object detection base classï Pre-processïImagePreprocessor
class contains the common pre-precoess pipeline, color conversion
, crop
, resize
, normalization
, quantize
ã
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4