A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.espressif.com/projects/esp-dl/en/latest/tutorials/how_to_quantize_model.html below:

Website Navigation


How to quantize model -

How to quantize model

[中文]

ESP-DL must use a proprietary format .espdl for model deployment. This is a quantized model format that supports 8bit and 16bit. In this tutorial, we will take quantize_sin_model as an example to show how to use ESP-PPQ to quantize and export a .espdl model. The quantization method is Post Training Quantization (PTQ).

Preparation

Install ESP_PPQ

Pre-trained model

Run sin_model.py . This script trains a simple Pytorch model to fit the sin function in the range [0, 2pi]. After training, the corresponding .pth weights will be saved and the ONNX model will be exported.

Note

ESP-PPQ provides two interfaces, espdl_quantize_onnx and espdl_quantize_torch, to support ONNX models and PyTorch models. Other deep learning frameworks, such as TensorfFlow, PaddlePaddle, etc., need to be converted to ONNX first.

Quantize and export .espdl

Reference quantize_torch_model.py and quantize_onnx_model.py , learn how to use the espdl_quantize_onnx and espdl_quantize_torch interfaces to quantize and export the .espdl model.

After executing the script, three files will be exported:

Note

  1. The .espdl models of different platforms cannot be mixed, otherwise the inference results will be inaccurate.

  2. The quantization strategy currently used by ESP-DL is symmetric quantization + POWER OF TWO.

Add test input/output

To verify whether the inference results of the model on the board are correct, you first need to record a set of test input/output on the PC. By turning on the export_test_values option in the api, a set of test input/output can be saved in the .espdl model. One of the input_shape and inputs parameters must be specified. The input_shape parameter uses a random test input, while inputs can use a specific test input. The values ​​of the test input/output can be viewed in the .info file. Search for test inputs value and test outputs value to view them.

Quantized model inference & accuracy evaluation

espdl_quantize_onnx and espdl_quantize_torch APIs will return BaseGraph. Use BaseGraph to build the corresponding TorchExecutor to use the quantized model for inference on the PC side.

executor = TorchExecutor(graph=quanted_graph, device=device)
output = executor(input)

The output obtained by quantized model inference can be used to calculate various accuracy metrics. Since the board-side esp-dl inference result can be aligned with esp-ppq, these metrics can be used directly to evaluate the accuracy of the quantized model.

Note

  1. Currently esp-dl only supports batch_size of 1, and does not support multi-batch or dynamic batch.

  2. The test input/output and the quantized model weights in the .info file are all 16-byte aligned. If the length is less than 16 bytes, it will be padded with 0.

Advanced Quantization Methods

If you want to further improve the performance of the quantized model, please try the the following advanced quantization methods:

Post Training Quantization (PTQ) Quantization Aware Training (QAT)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4