PP-DocTranslation is a document intelligent translation solution provided by PaddlePaddle. It integrates advanced general layout analysis technology and large language model (LLM) capabilities to offer you efficient document intelligent translation services. This solution can accurately identify and extract various elements within documents, including text blocks, headings, paragraphs, images, tables, and other complex layout structures, and on this basis, achieve high-quality multilingual translation. PP-DocTranslation supports mutual translation among multiple mainstream languages, particularly excelling in handling documents with complex layouts and strong contextual dependencies, striving to deliver precise, natural, fluent, and professional translation results. This pipeline also provides flexible serving options, supporting the use of multiple programming languages on various hardware. Moreover, it offers the capability for secondary development, allowing you to train and fine-tune models on your own datasets based on this pipeline, and the trained models can also be seamlessly integrated.
The PP-DocTranslation pipeline uses the PP-StructureV3 sub-pipeline, and thus has all the functions of the PP-StructureV3 pipeline. For more information on the functions and usage details of the PP-StructureV3 pipeline, you can click on the PP-StructureV3 Pipeline Documentation page.
In this pipeline, you can select the model to use based on the benchmark data below.
👉Model List DetailsDocument Image Orientation Classification Module:
Model Download Link Top-1 Acc (%) GPU Inference Time (ms)Text Image Unwarping Module:
Layout Detection Module Models:
Model Download Link mAP(0.5) (%) GPU Inference Time (ms)Table Structure Recognition Module:
Model Download Link Accuracy (%) GPU Inference Time (ms)Table Classification Module Models:
Model Download Link Top-1 Acc (%) GPU Inference Time (ms)Table Cell Detection Module Models:
Model Download Link mAP (%) GPU Inference Time (ms)Text Detection Module:
Model Download Link Detection Hmean (%) GPU Inference Time (ms)Text Recognition Module Models:
* Chinese Recognition Models Model Download Link Recognition Avg Accuracy(%) GPU Inference Time (ms)Text Line Orientation Classification Module (Optional):
Model Download Link Top-1 Acc(%) GPU Inference Time (ms)Formula Recognition Module:
Model Download Link Avg-BLEU(%) GPU Inference Time (ms)Seal Text Recognition Module:
Model Download Link Detection Hmean(%) GPU Inference Time (ms)Before using the PP-DocTranslation pipeline locally, please ensure that you have completed the installation of the wheel package according to the Installation Tutorial.
Please note: If you encounter issues such as the program becoming unresponsive, unexpected program termination, running out of memory resources, or extremely slow inference during execution, please try adjusting the configuration according to the documentation, such as disabling unnecessary features or using lighter-weight models.
Before use, you need to prepare the API key for a large language model, which supports the Baidu Cloud Qianfan Platform or local large model services that comply with the OpenAI interface standards.
2.1 Experience via Command Line¶You can download the test file and quickly experience the pipeline effect with a single command:
paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --qianfan_api_key your_api_key
Command line supports more parameter settings. Click to expand for detailed description of command line parameters Parameter Description Type Default Value input
Data to be predicted, required. For example, local path of image file or PDF file: /root/data/img.jpg
; URL link, such as network URL of image or PDF file: example; local directory, the directory must contain images to be predicted, such as local path: /root/data/
(currently does not support PDF files in the directory, PDF files need to specify the exact file path). str
save_path
Specifies the path to save the inference result files. If not set, inference results will not be saved locally. str
target_language
Target language (ISO 639-1 language code). str
zh
layout_detection_model_name
Model name for layout detection. If not set, the pipeline default model will be used. str
layout_detection_model_dir
Directory path of the layout detection model. If not set, the official model will be downloaded. str
layout_threshold
Score threshold for layout model. Any float between 0-1
. If not set, the pipeline initialized value will be used, default initialized as 0.5
. float
layout_nms
Whether to use post-processing NMS in layout detection. If not set, the pipeline initialized value will be used, default initialized as True
. bool
layout_unclip_ratio
Expansion coefficient for detection boxes in layout detection model. Any float greater than 0
. If not set, the pipeline initialized value will be used, default initialized as 1.0
. float
layout_merge_bboxes_mode
Mode for merging detection boxes output by the layout detection model.
large
. str
chart_recognition_model_name
Model name for chart parsing. If not set, the pipeline default model will be used. str
chart_recognition_model_dir
Directory path for chart parsing model. If not set, the official model will be downloaded. str
chart_recognition_batch_size
Batch size for chart parsing model. If not set, batch size defaults to 1
. int
region_detection_model_name
Model name for region detection. If not set, the pipeline default model will be used. str
region_detection_model_dir
Directory path for region detection model. If not set, the official model will be downloaded. str
doc_orientation_classify_model_name
Model name for document orientation classification. If not set, the pipeline default model will be used. str
doc_orientation_classify_model_dir
Directory path for document orientation classification model. If not set, the official model will be downloaded. str
doc_unwarping_model_name
Model name for text image unwarping. If not set, the pipeline default model will be used. str
doc_unwarping_model_dir
Directory path for text image unwarping model. If not set, the official model will be downloaded. str
text_detection_model_name
Model name for text detection. If not set, the pipeline default model will be used. str
text_detection_model_dir
Directory path for text detection model. If not set, the official model will be downloaded. str
text_det_limit_side_len
Image side length limit for text detection. Any integer greater than 0
. If not set, the pipeline initialized value will be used, default initialized as 960
. int
text_det_limit_type
Type of image side length limit for text detection. Supports min
and max
. min
means ensuring the shortest side of the image is not less than det_limit_side_len
, max
means ensuring the longest side of the image is not greater than limit_side_len
. If not set, the pipeline initialized value will be used, default initialized as max
. str
text_det_thresh
Detection pixel threshold. In the output probability map, pixels with score greater than this threshold are considered text pixels. Any float greater than 0
. If not set, the pipeline initialized value 0.3
will be used by default. float
text_det_box_thresh
Detection box threshold. If the average score of all pixels within the detected bounding box is greater than this threshold, the result is considered a text region. Any float greater than 0
. If not set, the pipeline initialized value 0.6
will be used by default. float
text_det_unclip_ratio
Text detection expansion coefficient, used to expand text regions. The larger the value, the larger the expansion area. Any float greater than 0
. If not set, the pipeline initialized value 2.0
will be used by default. float
textline_orientation_model_name
Model name for textline orientation. If not set, the pipeline default model will be used. str
textline_orientation_model_dir
Directory path for textline orientation model. If not set, the official model will be downloaded. str
textline_orientation_batch_size
Batch size for textline orientation model. If not set, batch size defaults to 1
. int
text_recognition_model_name
Model name for text recognition. If not set, the pipeline default model will be used. str
text_recognition_model_dir
Directory path for text recognition model. If not set, the official model will be downloaded. str
text_recognition_batch_size
Batch size for text recognition model. If not set, batch size defaults to 1
. int
text_rec_score_thresh
Text recognition threshold. Text results with scores greater than this threshold will be kept. Any float greater than 0
. If not set, the pipeline initialized value 0.0
will be used, meaning no threshold. float
table_classification_model_name
Model name for table classification. If not set, the pipeline default model will be used. str
table_classification_model_dir
Directory path for table classification model. If not set, the official model will be downloaded. str
wired_table_structure_recognition_model_name
Model name for wired table structure recognition. If not set, the pipeline default model will be used. str
wired_table_structure_recognition_model_dir
Directory path for wired table structure recognition model. If not set, the official model will be downloaded. str
wireless_table_structure_recognition_model_name
Model name for wireless table structure recognition. If not set, the pipeline default model will be used. str
wireless_table_structure_recognition_model_dir
Directory path for wireless table structure recognition model. If not set, the official model will be downloaded. str
wired_table_cells_detection_model_name
Model name for wired table cells detection. If not set, the pipeline default model will be used. str
wired_table_cells_detection_model_dir
Directory path for wired table cells detection model. If not set, the official model will be downloaded. str
wireless_table_cells_detection_model_name
Model name for wireless table cells detection. If not set, the pipeline default model will be used. str
wireless_table_cells_detection_model_dir
Directory path for wireless table cells detection model. If not set, the official model will be downloaded. str
table_orientation_classify_model_name
Model name for table orientation classification. If not set, the pipeline default model will be used. str
table_orientation_classify_model_dir
Directory path for table orientation classification model. If not set, the official model will be downloaded. str
seal_text_detection_model_name
Model name for seal text detection. If not set, the pipeline default model will be used. str
seal_text_detection_model_dir
Directory path for seal text detection model. If not set, the official model will be downloaded. str
seal_det_limit_side_len
Image side length limit for seal text detection. Any integer greater than 0
. If not set, the pipeline initialized value will be used, default initialized as 736
. int
seal_det_limit_type
Type of image side length limit for seal text detection. Supports min
and max
. min
means ensuring the shortest side of the image is not less than det_limit_side_len
, max
means ensuring the longest side is not greater than limit_side_len
. If not set, the pipeline initialized value will be used, default initialized as min
. str
seal_det_thresh
Detection pixel threshold. In the output probability map, pixels with score greater than this threshold are considered text pixels. Any float greater than 0
. If not set, the pipeline initialized value 0.2
will be used by default. float
seal_det_box_thresh
Detection box threshold. If the average score of all pixels within the detected bounding box is greater than this threshold, the result is considered a text region. Any float greater than 0
. If not set, the pipeline initialized value 0.6
will be used by default. float
seal_det_unclip_ratio
Expansion coefficient for seal text detection. This method is used to expand the text region; the larger the value, the larger the expansion area. Any float greater than 0
. If not set, the pipeline initialized value 0.5
will be used by default. float
seal_text_recognition_model_name
Model name for seal text recognition. If not set, the pipeline default model will be used. str
seal_text_recognition_model_dir
Directory path for seal text recognition model. If not set, the official model will be downloaded. str
seal_text_recognition_batch_size
Batch size for seal text recognition model. If not set, batch size defaults to 1
. int
seal_rec_score_thresh
Text recognition threshold. Text results with scores greater than this threshold will be kept. Any float greater than 0
. If not set, the pipeline initialized value 0.0
will be used, meaning no threshold. float
formula_recognition_model_name
Model name for formula recognition. If not set, the pipeline default model will be used. str
formula_recognition_model_dir
Directory path for formula recognition model. If not set, the official model will be downloaded. str
formula_recognition_batch_size
Batch size of the formula recognition model. If not set, the batch size defaults to 1
. int
use_doc_orientation_classify
Whether to load and use the document orientation classification module. If not set, the pipeline initialized value will be used, default is False
. bool
use_doc_unwarping
Whether to load and use the text image unwarping module. If not set, the pipeline initialized value will be used, default is False
. bool
use_textline_orientation
Whether to load and use the text line orientation classification module. If not set, the pipeline initialized value will be used, default is True
. bool
use_seal_recognition
Whether to load and use the seal text recognition sub-pipeline. If not set, the pipeline initialized value will be used, default is True
. bool
use_table_recognition
Whether to load and use the table recognition sub-pipeline. If not set, the pipeline initialized value will be used, default is True
. bool
use_formula_recognition
Whether to load and use the formula recognition sub-pipeline. If not set, the pipeline initialized value will be used, default is True
. bool
use_chart_recognition
Whether to load and use the chart parsing module. If not set, the pipeline initialized value will be used, default is False
. bool
use_region_detection
Whether to load and use the region detection module. If not set, the pipeline initialized value will be used, default is True
. bool
qianfan_api_key
API key for the Qianfan platform. str
device
Device used for inference. Supports specifying exact card number:
cpu
means using CPU for inference;gpu:0
means using GPU #1 for inference;npu:0
means using NPU #1 for inference;xpu:0
means using XPU #1 for inference;mlu:0
means using MLU #1 for inference;dcu:0
means using DCU #1 for inference;str
enable_hpi
Whether to enable high-performance inference. bool
False
use_tensorrt
Whether to enable the TensorRT subgraph engine of Paddle Inference. If the model does not support acceleration by TensorRT, enabling this flag will not enable acceleration.
bool
False
precision
Computation precision, e.g. fp32, fp16. str
fp32
enable_mkldnn
Whether to enable MKL-DNN accelerated inference. If MKL-DNN is unavailable or the model does not support acceleration via MKL-DNN, enabling this flag will not enable acceleration. bool
True
mkldnn_cache_capacity
MKL-DNN cache capacity. int
10
cpu_threads
Number of threads used for inference on CPU. int
8
paddlex_config
Path to PaddleX pipeline configuration file. str
The execution results will be printed to the terminal.
2.2 Integration via Python Script¶The command-line method is for quickly experiencing and viewing the results. Generally, in projects, integration via code is often required. You can download the test file and use the following sample code for inference:
from paddleocr import PPDocTranslation
# Create a translation pipeline
pipeline = PPDocTranslation()
# Document path
input_path = "document_sample.pdf"
# Output directory
output_path = "./output"
# Large model configuration
chat_bot_config = {
"module_name": "chat_bot",
"model_name": "ernie-3.5-8k",
"base_url": "https://qianfan.baidubce.com/v2",
"api_type": "openai",
"api_key": "api_key", # your api_key
}
if input_path.lower().endswith(".md"):
# Read markdown documents, supporting passing in directories and url links with the .md suffix
ori_md_info_list = pipeline.load_from_markdown(input_path)
else:
# Use PP-StructureV3 to perform layout parsing on PDF/image documents to obtain markdown information
visual_predict_res = pipeline.visual_predict(
input_path,
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_common_ocr=True,
use_seal_recognition=True,
use_table_recognition=True,
)
ori_md_info_list = []
for res in visual_predict_res:
layout_parsing_result = res["layout_parsing_result"]
ori_md_info_list.append(layout_parsing_result.markdown)
layout_parsing_result.save_to_img(output_path)
layout_parsing_result.save_to_markdown(output_path)
# Concatenate the markdown information of multi-page documents into a single markdown file, and save the merged original markdown text
if input_path.lower().endswith(".pdf"):
ori_md_info = pipeline.concatenate_markdown_pages(ori_md_info_list)
ori_md_info.save_to_markdown(output_path)
# Perform document translation (target language: English)
tgt_md_info_list = pipeline.translate(
ori_md_info_list=ori_md_info_list,
target_language="en",
chunk_size=5000,
chat_bot_config=chat_bot_config,
)
# Save the translation results
for tgt_md_info in tgt_md_info_list:
tgt_md_info.save_to_markdown(output_path)
After executing the above code, you will obtain the parsed results of the original document to be translated, the Markdown file of the original text to be translated, and the Markdown file of the translated document, all saved in the output
directory.
The process, API description, and output description of PP-DocTranslation prediction are as follows:
(1) Instantiate the PP-DocTranslation pipeline object by callingPPDocTranslation
. Relevant parameter descriptions are as follows: Parameter Description Type Default Value layout_detection_model_name
The model name for layout detection. If set to None
, the pipeline's default model will be used. str|None
None
layout_detection_model_dir
The directory path of the layout detection model. If set to None
, the official model will be downloaded. str|None
None
layout_threshold
Score threshold for the layout model.
0-1
;{0:0.1}
, where the key is the class ID and the value is the threshold for that class;None
, the pipeline's initialized value will be used, defaulting to 0.5
.float|dict|None
None
layout_nms
Whether to use post-processing NMS for layout detection. If set to None
, the pipeline's initialized value will be used, defaulting to True
. bool|None
None
layout_unclip_ratio
Expansion coefficient for detection boxes in the layout detection model.
0
;cls_id
, values are tuple, e.g. {0: (1.1, 2.0)}
, meaning for class 0 detection boxes, center remains unchanged, width expanded by 1.1 times, height expanded by 2.0 times;None
, the pipeline's initialized value will be used, defaulting to 1.0
.float|Tuple[float,float]|dict|None
None
layout_merge_bboxes_mode
Overlap box filtering method for layout detection.
large
, small
, union
, indicating whether to keep the larger box, smaller box, or both during overlap filtering;cls_id
, values are str, e.g. {0: "large", 2: "small"}
, meaning use "large" mode for class 0 boxes and "small" mode for class 2 boxes;None
, the pipeline's initialized value will be used, defaulting to large
.str|dict|None
None
chart_recognition_model_name
The model name for chart parsing. If set to None
, the pipeline's default model will be used. str|None
None
chart_recognition_model_dir
The directory path of the chart parsing model. If set to None
, the official model will be downloaded. str|None
None
chart_recognition_batch_size
Batch size for the chart parsing model. If set to None
, batch size defaults to 1
. int|None
None
region_detection_model_name
The model name for region detection. If set to None
, the pipeline's default model will be used. str|None
None
region_detection_model_dir
The directory path of the region detection model. If set to None
, the official model will be downloaded. str|None
None
doc_orientation_classify_model_name
The model name for document orientation classification. If set to None
, the pipeline's default model will be used. str|None
None
doc_orientation_classify_model_dir
The directory path of the document orientation classification model. If set to None
, the official model will be downloaded. str|None
None
doc_unwarping_model_name
The model name for text image unwarping. If set to None
, the pipeline's default model will be used. str|None
None
doc_unwarping_model_dir
The directory path of the text image unwarping model. If set to None
, the official model will be downloaded. str|None
None
text_detection_model_name
The model name for text detection. If set to None
, the pipeline's default model will be used. str|None
None
text_detection_model_dir
The directory path of the text detection model. If set to None
, the official model will be downloaded. str|None
None
text_det_limit_side_len
Image side length limit for text detection.
0
;None
, the pipeline's initialized value will be used, defaulting to 960
.int|None
None
text_det_limit_type
Type of image side length limit for text detection.
min
and max
, where min
means ensuring the shortest side of the image is not less than det_limit_side_len
, and max
means ensuring the longest side is not greater than limit_side_len
;None
, the pipeline's initialized value will be used, defaulting to max
.str|None
None
text_det_thresh
Pixel threshold for detection; pixels in the output probability map with scores above this threshold are considered text pixels.
0
;None
, the pipeline's initialized value of 0.3
will be used.float|None
None
text_det_box_thresh
Detection box threshold; when the average score of all pixels inside a detected box exceeds this threshold, it is considered a text region.
0
;None
, the pipeline's initialized value of 0.6
will be used.float|None
None
text_det_unclip_ratio
Expansion coefficient for text detection; this method expands the text region, and the larger the value, the larger the expansion area.
0
;None
, the pipeline's initialized value of 2.0
will be used.float|None
None
textline_orientation_model_name
The model name for text line orientation classification. If set to None
, the pipeline's default model will be used. str|None
None
textline_orientation_model_dir
The directory path of the text line orientation model. If set to None
, the official model will be downloaded. str|None
None
textline_orientation_batch_size
Batch size for the text line orientation model. If set to None
, batch size defaults to 1
. int|None
None
text_recognition_model_name
The model name for text recognition. If set to None
, the pipeline's default model will be used. str|None
None
text_recognition_model_dir
The directory path of the text recognition model. If set to None
, the official model will be downloaded. str|None
None
text_recognition_batch_size
Batch size for the text recognition model. If set to None
, batch size defaults to 1
. int|None
None
text_rec_score_thresh
Text recognition threshold; text results with scores greater than this threshold will be retained.
0
;None
, the pipeline's initialized value of 0.0
(no threshold) will be used.float|None
None
table_classification_model_name
The model name for table classification. If set to None
, the pipeline's default model will be used. str|None
None
table_classification_model_dir
The directory path of the table classification model. If set to None
, the official model will be downloaded. str|None
None
wired_table_structure_recognition_model_name
The model name for wired table structure recognition. If set to None
, the pipeline's default model will be used. str|None
None
wired_table_structure_recognition_model_dir
The directory path of the wired table structure recognition model. If set to None
, the official model will be downloaded. str|None
None
wireless_table_structure_recognition_model_name
The model name for wireless table structure recognition. If set to None
, the pipeline's default model will be used. str|None
None
wireless_table_structure_recognition_model_dir
The directory path of the wireless table structure recognition model. If set to None
, the official model will be downloaded. str|None
None
wired_table_cells_detection_model_name
The model name for wired table cell detection. If set to None
, the pipeline's default model will be used. str|None
None
wired_table_cells_detection_model_dir
The directory path of the wired table cell detection model. If set to None
, the official model will be downloaded. str|None
None
wireless_table_cells_detection_model_name
The model name for wireless table cell detection. If set to None
, the pipeline's default model will be used. str|None
None
wireless_table_cells_detection_model_dir
The directory path of the wireless table cell detection model. If set to None
, the official model will be downloaded. str|None
None
table_orientation_classify_model_name
The model name for table orientation classification. If set to None
, the pipeline's default model will be used. str|None
None
table_orientation_classify_model_dir
The directory path of the table orientation classification model. If set to None
, the official model will be downloaded. str|None
None
seal_text_detection_model_name
The model name for seal text detection. If set to None
, the pipeline's default model will be used. str|None
None
seal_text_detection_model_dir
The directory path of the seal text detection model. If set to None
, the official model will be downloaded. str|None
None
seal_det_limit_side_len
Image side length limit for seal text detection.
0
;None
, the parameter value initialized by the pipeline will be used, with a default initialization of 736
.int|None
None
seal_det_limit_type
Type of image side length limit for seal text detection.
min
and max
, where min
ensures the shortest image side is not less than det_limit_side_len
, and max
ensures the longest image side is not greater than limit_side_len
;None
, the parameter value initialized by the pipeline will be used, with a default initialization of min
.str|None
None
seal_det_thresh
Detection pixel threshold. In the output probability map, pixels with scores above this threshold are considered text pixels.
0
;None
, the pipeline default parameter value 0.2
will be used.float|None
None
seal_det_box_thresh
Detection box threshold. When the average score of all pixels within the detected bounding box is greater than this threshold, the result is considered a text region.
0
;None
, the pipeline default parameter value 0.6
will be used.float|None
None
seal_det_unclip_ratio
Expansion coefficient for seal text detection. This method expands the text region; the larger the value, the larger the expansion area.
0
;None
, the pipeline default parameter value 0.5
will be used.float|None
None
seal_text_recognition_model_name
Name of the seal text recognition model. If set to None
, the pipeline default model will be used. str|None
None
seal_text_recognition_model_dir
Directory path for the seal text recognition model. If set to None
, the official model will be downloaded. str|None
None
seal_text_recognition_batch_size
Batch size for the seal text recognition model. If set to None
, the batch size defaults to 1
. int|None
None
seal_rec_score_thresh
Seal text recognition threshold. Text results with scores above this threshold will be retained.
0
;None
, the pipeline default parameter value 0.0
will be used, meaning no threshold is set.float|None
None
formula_recognition_model_name
Name of the formula recognition model. If set to None
, the pipeline default model will be used. str|None
None
formula_recognition_model_dir
Directory path for the formula recognition model. If set to None
, the official model will be downloaded. str|None
None
formula_recognition_batch_size
Batch size for the formula recognition model. If set to None
, the batch size defaults to 1
. int|None
None
use_doc_orientation_classify
Whether to load and use the document orientation classification module. If set to None
, the pipeline initialized parameter value will be used, defaulting to False
. bool|None
None
use_doc_unwarping
Whether to load and use the text image unwarping module. If set to None
, the pipeline initialized parameter value will be used, defaulting to False
. bool|None
None
use_textline_orientation
Whether to load and use the text line orientation classification module. If set to None
, the pipeline initialized parameter value will be used, defaulting to True
. bool|None
None
use_seal_recognition
Whether to load and use the seal text recognition sub-pipeline. If set to None
, the pipeline initialized parameter value will be used, defaulting to True
. bool|None
None
use_table_recognition
Whether to load and use the table recognition sub-pipeline. If set to None
, the pipeline initialized parameter value will be used, defaulting to True
. bool|None
None
use_formula_recognition
Whether to load and use the formula recognition sub-pipeline. If set to None
, the pipeline initialized parameter value will be used, defaulting to True
. bool|None
None
use_chart_recognition
Whether to load and use the chart parsing module. If set to None
, the pipeline initialized parameter value will be used, defaulting to False
. bool|None
None
use_region_detection
Whether to load and use the document region detection module. If set to None
, the pipeline initialized parameter value will be used, defaulting to True
. bool|None
None
chat_bot_config
Large language model configuration information. The configuration content is the following dict:
{
"module_name": "chat_bot",
"model_name": "ernie-3.5-8k",
"base_url": "https://qianfan.baidubce.com/v2",
"api_type": "openai",
"api_key": "api_key" # Please set this to the actual API key
}
dict|None
None
device
Device used for inference. Supports specifying a specific card number:
cpu
means using CPU for inference;gpu:0
means using the first GPU for inference;npu:0
means using the first NPU for inference;xpu:0
means using the first XPU for inference;mlu:0
means using the first MLU for inference;dcu:0
means using the first DCU for inference;None
, initialization will prioritize using the local GPU device 0; if unavailable, CPU will be used.str|None
None
enable_hpi
Whether to enable high-performance inference. bool
False
use_tensorrt
Whether to enable Paddle Inference’s TensorRT subgraph engine. If the model does not support acceleration via TensorRT, enabling this flag will have no effect.
bool
False
precision
Computation precision, such as fp32, fp16. str
"fp32"
enable_mkldnn
Whether to enable MKL-DNN accelerated inference. If MKL-DNN is unavailable or the model does not support acceleration via MKL-DNN, enabling this flag will have no effect. bool
True
mkldnn_cache_capacity
MKL-DNN cache capacity. int
10
cpu_threads
Number of threads used during inference on CPU. int
8
paddlex_config
Path to the PaddleX pipeline configuration file. str|None
None
(2) Call the visual_predict()
method of the PP-DocTranslation pipeline object to obtain visual prediction results. This method returns a list of results. Additionally, the pipeline provides a visual_predict_iter()
method. Both methods accept the same parameters and return the same results, but visual_predict_iter()
returns a generator
, which can process and retrieve prediction results step-by-step, suitable for large datasets or memory-saving scenarios. You can choose either method according to your actual needs. Below are the parameters of the visual_predict()
method and their descriptions: Parameter Description Type Default input
Data to be predicted, supports multiple input types, required.
numpy.ndarray
;/root/data/img.jpg
; URL link: network URL of image or PDF files, e.g. example; local directory: directory containing images to be predicted, e.g. /root/data/
(currently does not support PDFs in directories, PDF files need to specify exact file path);[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
.Python Var|str|list
use_doc_orientation_classify
Whether to use the document orientation classification module during inference. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
use_doc_unwarping
Whether to use the text image unwarping module during inference. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
use_textline_orientation
Whether to use the text line orientation classification module during inference. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
use_seal_recognition
Whether to use the seal text recognition sub-pipeline during inference. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
use_table_recognition
Whether to use the table recognition sub-pipeline during inference. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
use_formula_recognition
Whether to use the formula recognition sub-pipeline during inference. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
use_chart_recognition
Whether to use the chart parsing module during inference. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
use_region_detection
Whether to use the document layout detection module during inference. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
layout_threshold
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|dict|None
None
layout_nms
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. bool|None
None
layout_unclip_ratio
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|Tuple[float,float]|dict|None
None
layout_merge_bboxes_mode
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. str|dict|None
None
text_det_limit_side_len
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. int|None
None
text_det_limit_type
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. str|None
None
text_det_thresh
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|None
None
text_det_box_thresh
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|None
None
text_det_unclip_ratio
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|None
None
text_rec_score_thresh
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|None
None
seal_det_limit_side_len
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. int|None
None
seal_det_limit_type
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. str|None
None
seal_det_thresh
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|None
None
seal_det_box_thresh
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|None
None
seal_det_unclip_ratio
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|None
None
seal_rec_score_thresh
Parameter meaning is basically the same as the instantiated parameter. Setting to None
means using the instantiated parameter, otherwise this parameter has higher priority. float|None
None
use_wired_table_cells_trans_to_html
Whether to enable direct conversion of wired table cell detection results to HTML. When enabled, HTML is constructed directly based on the geometric relations of wired table cell detection results. bool
False
use_wireless_table_cells_trans_to_html
Whether to enable direct conversion of wireless table cell detection results to HTML. When enabled, HTML is constructed directly based on the geometric relations of wireless table cell detection results. bool
False
use_table_orientation_classify
Whether to enable table orientation classification. When enabled, tables with 90/180/270 degree rotations in images can be corrected in orientation and correctly recognized. bool
True
use_ocr_results_with_table_cells
Whether to enable OCR segmentation by table cells. When enabled, OCR detection results are segmented and re-recognized based on cell prediction results to avoid missing text. bool
True
use_e2e_wired_table_rec_model
Whether to enable end-to-end wired table recognition mode. When enabled, the cell detection model is not used, only the table structure recognition model is used. bool
False
use_e2e_wireless_table_rec_model
Whether to enable end-to-end wireless table recognition mode. When enabled, the cell detection model is not used, only the table structure recognition model is used. bool
True
(3) Processing visual prediction results: Each sample's prediction result is a corresponding Result object, supporting operations such as printing, saving as images, and saving as json
files: Method Description Parameter Parameter Type Parameter Description Default print()
Print results to terminal format_json
bool
Whether to format the output content using JSON
indentation True
indent
int
Specify indentation level to beautify output JSON
data for better readability, effective only when format_json
is True
4 ensure_ascii
bool
Control whether non-ASCII
characters are escaped as Unicode
. When set to True
, all non-ASCII
characters will be escaped; if False
, original characters are preserved. Effective only when format_json
is True
False
save_to_json()
Save results as a JSON file save_path
str
File path for saving. If a directory is specified, the saved file name matches the input file type name None indent
int
Specify indentation level to beautify output JSON
data for better readability, effective only when format_json
is True
4 ensure_ascii
bool
Control whether non-ASCII
characters are escaped as Unicode
. When set to True
, all non-ASCII
characters will be escaped; if False
, original characters are preserved. Effective only when format_json
is True
False
save_to_img()
Save visualized images from intermediate modules as PNG format images save_path
str
File path for saving, supports directory or file path None save_to_markdown()
Save each page of image or PDF files as separate markdown files save_path
str
File path for saving, supports directory or file path None save_to_html()
Save tables in the file as HTML format files save_path
str
File path for saving, supports directory or file path None save_to_xlsx()
Save tables in the file as XLSX format files save_path
str
File path for saving, supports directory or file path None - Calling the `print()` method will print the results to the terminal, with the following explanation of printed content: - `input_path`: `(str)` Input path of the image or PDF to be predicted - `page_index`: `(Union[int, None])` If the input is a PDF, this indicates the current page number; otherwise `None` - `model_settings`: `(Dict[str, bool])` Model parameters configured for the pipeline - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline - `use_general_ocr`: `(bool)` Controls whether to enable the OCR sub-pipeline - `use_seal_recognition`: `(bool)` Controls whether to enable the seal text recognition sub-pipeline - `use_table_recognition`: `(bool)` Controls whether to enable the table recognition sub-pipeline - `use_formula_recognition`: `(bool)` Controls whether to enable the formula recognition sub-pipeline - `doc_preprocessor_res`: `(Dict[str, Union[List[float], str]])` Document preprocessing result dictionary, present only when `use_doc_preprocessor=True` - `input_path`: `(str)` Image path accepted by the document preprocessing sub-pipeline; if input is `numpy.ndarray`, saved as `None`, here it is `None` - `page_index`: `None`, here input is `numpy.ndarray`, so value is `None` - `model_settings`: `(Dict[str, bool])` Model configuration parameters of the document preprocessing sub-pipeline - `use_doc_orientation_classify`: `(bool)` Controls whether to enable the document image orientation classification sub-module - `use_doc_unwarping`: `(bool)` Controls whether to enable the text image unwarping sub-module - `angle`: `(int)` Prediction result of the document image orientation classification sub-module, returns actual angle value if enabled - `parsing_res_list`: `(List[Dict])` List of parsing results, each element is a dictionary; list order corresponds to reading order after parsing - `block_bbox`: `(np.ndarray)` Bounding box of layout detection - `block_label`: `(str)` Label of the layout region, e.g. `text`, `table`, etc. - `block_content`: `(str)` Content within the layout region - `seg_start_flag`: `(bool)` Indicates whether this layout region is the start of a paragraph - `seg_end_flag`: `(bool)` Indicates whether this layout region is the end of a paragraph - `sub_label`: `(str)` Sub-label of the layout region, e.g. sub-label of `text` could be `title_text` - `sub_index`: `(int)` Sub-index of the layout region, used for restoring Markdown - `index`: `(int)` Index of the layout region, used to display layout sorting results - `overall_ocr_res`: `(Dict[str, Union[List[str], List[float], numpy.ndarray]])` Global OCR result dictionary - `input_path`: `(Union[str, None])` Image path accepted by the image OCR sub-pipeline; if input is `numpy.ndarray`, saved as `None` - `page_index`: `None`, here input is `numpy.ndarray`, so value is `None` - `model_settings`: `(Dict)` Model configuration parameters of the OCR sub-pipeline - `dt_polys`: `(List[numpy.ndarray])` List of text detection polygons; each detection box is a numpy array with 4 vertex coordinates, shape (4, 2), dtype int16 - `dt_scores`: `(List[float])` Confidence scores of text detection boxes - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters of the text detection module - `limit_side_len`: `(int)` Length limit for image preprocessing - `limit_type`: `(str)` Processing method for length limit - `thresh`: `(float)` Confidence threshold for text pixel classification - `box_thresh`: `(float)` Confidence threshold for text detection boxes - `unclip_ratio`: `(float)` Expansion factor for text detection boxes - `text_type`: `(str)` Type of text detection, currently fixed as "general" - `text_type`: `(str)` Type of text detection, currently fixed as "general" - `textline_orientation_angles`: `(List[int])` Prediction results of text line orientation classification; returns actual angle values when enabled (e.g. [0,0,1]) - `text_rec_score_thresh`: `(float)` Filtering threshold for text recognition results - `rec_texts`: `(List[str])` List of text recognition results, only including texts exceeding the `text_rec_score_thresh` - `rec_scores`: `(List[float])` Confidence scores of text recognition, filtered by `text_rec_score_thresh` - `rec_polys`: `(List[numpy.ndarray])` List of text detection boxes filtered by confidence, format same as `dt_polys` - `formula_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` List of formula recognition results, each element is a dictionary - `rec_formula`: `(str)` Formula recognition result - `rec_polys`: `(numpy.ndarray)` Formula detection boxes, shape (4, 2), dtype int16 - `formula_region_id`: `(int)` Region ID where the formula is located - `seal_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` List of seal recognition results, each element is a dictionary - `input_path`: `(str)` Input path of seal image - `page_index`: `None`, here input is `numpy.ndarray`, so value is `None` - `model_settings`: `(Dict)` Model configuration parameters of the seal text recognition sub-pipeline - `dt_polys`: `(List[numpy.ndarray])` List of seal detection boxes, format same as `dt_polys` - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters of the seal detection module, meanings same as above - `text_type`: `(str)` Type of seal detection, currently fixed as "seal" - `text_rec_score_thresh`: `(float)` Filtering threshold for seal recognition results - `rec_texts`: `(List[str])` List of seal recognition results, only including texts exceeding the `text_rec_score_thresh` - `rec_scores`: `(List[float])` Confidence scores of seal recognition, filtered by `text_rec_score_thresh` - `rec_polys`: `(List[numpy.ndarray])` List of seal detection boxes filtered by confidence, format same as `dt_polys` - `rec_boxes`: `(numpy.ndarray)` Rectangular bounding box array of detection boxes, shape (n, 4), dtype int16; each row represents one rectangle - `table_res_list`: `(List[Dict[str, Union[numpy.ndarray, List[float], str]]])` List of table recognition results, each element is a dictionary - `cell_box_list`: `(List[numpy.ndarray])` List of table cell bounding boxes - `pred_html`: `(str)` Table in HTML format string - `table_ocr_pred`: `(dict)` OCR recognition results of the table - `rec_polys`: `(List[numpy.ndarray])` List of cell detection boxes - `rec_texts`: `(List[str])` Recognition results of cells - `rec_scores`: `(List[float])` Recognition confidence scores of cells - `rec_boxes`: `(numpy.ndarray)` Rectangular bounding box array of detection boxes, shape (n, 4), dtype int16; each row represents one rectangle - Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, all `numpy.array` types will be converted to list format. - Calling the `save_to_img()` method will save visualization results to the specified `save_path`. If a directory is specified, it will save layout detection visual images, global OCR visual images, layout reading order visual images, etc. If a file is specified, it will be saved directly to that file. (The pipeline usually contains many result images, so it is not recommended to specify a specific file path directly, or multiple images will be overwritten, leaving only the last image.) - Calling the `save_to_markdown()` method will save the converted Markdown files to the specified `save_path`. The saved file path will be `save_path/{your_img_basename}.md`. If the input is a PDF file, it is recommended to specify a directory directly, otherwise multiple markdown files will be overwritten. - Calling the `concatenate_markdown_pages()` method merges the multi-page Markdown contents output by the PP-DocTranslation pipeline `markdown_list` into a single complete document and returns the merged Markdown content. (4) Call the translate()
method to perform document translation. This method returns the original and translated markdown content as a markdown object, which can be saved locally by executing the save_to_markdown()
method for the desired parts. Below are the relevant parameters of the translate()
method: Parameter Description Type Default ori_md_info_list
List of original Markdown data containing content to be translated. Must be a list of dictionaries, each representing a document block List[Dict]
target_language
Target language (ISO 639-1 language code, e.g. "en"
/"ja"
/"fr"
) str
"zh"
chunk_size
Character count threshold for chunked translation processing int
5000
task_description
Custom task description prompt str|None
None
output_format
Specified output format requirements, e.g. "preserve original Markdown structure" str|None
None
rules_str
Custom translation rule description str|None
None
few_shot_demo_text_content
Few-shot learning example text content str|None
None
few_shot_demo_key_value_list
Structured few-shot example data in key-value pairs, can include professional terminology glossary str|None
None
glossary
Professional terminology glossary for translation dict|None
None
llm_request_interval
Interval in seconds between requests to the large language model. This parameter helps prevent too frequent calls to the LLM. float
0.0
chat_bot_config
Large language model configuration. Setting to None
uses instantiation parameters; otherwise, this parameter takes priority. dict|None
None
3. Development Integration/Deployment¶
If the pipeline can meet your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.
If you need to directly apply the pipeline in your Python project, you can refer to the sample code in 2.2 Python Script Approach.
In addition, PaddleOCR also offers two other deployment methods, detailed as follows:
🚀 High-Performance Inference: In real-world production environments, many applications have stringent performance criteria (especially response speed) for deployment strategies to ensure efficient system operation and a smooth user experience. To this end, PaddleOCR provides high-performance inference capabilities, aiming to deeply optimize model inference and pre/post-processing, achieving significant acceleration in the end-to-end process. For detailed information on the high-performance inference process, please refer to High-Performance Inference.
☁️ Serving: Serving is a common deployment form in real-world production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. For detailed information on the pipeline serving process, please refer to Serving.
Below are the API references for basic serving and examples of multi-language service invocation:
API ReferenceMain operations provided by the serving:
200
, and the response body has the following properties:logId
string
Request UUID. errorCode
integer
Error code. Fixed as 0
. errorMsg
string
Error message. Fixed as "Success"
. result
object
Operation result.
logId
string
Request UUID. errorCode
integer
Error code. Same as response status code. errorMsg
string
Error message.
Main operations provided by the serving are as follows:
analyzeImages
Use computer vision models to analyze images, obtaining OCR, table recognition results, etc.
POST /doctrans-visual
file
string
URL of image or PDF file accessible by the server, or Base64 encoding of such file contents. By default, for PDF files over 10 pages, only the first 10 pages are processed.
Serving:
extra:
max_num_input_imgs: null
Yes fileType
integer
|null
File type. 0
means PDF, 1
means image file. If not present in the request, the file type will be inferred from the URL. No useDocOrientationClassify
boolean
| null
See the use_doc_orientation_classify
parameter description in the pipeline object's visual_predict
method. No useDocUnwarping
boolean
| null
See the use_doc_unwarping
parameter description in the pipeline object's visual_predict
method. No useTextlineOrientation
boolean
| null
See the use_textline_orientation
parameter description in the pipeline object's visual_predict
method. No useSealRecognition
boolean
| null
See the use_seal_recognition
parameter description in the pipeline object's visual_predict
method. No useTableRecognition
boolean
| null
See the use_table_recognition
parameter description in the pipeline object's visual_predict
method. No useFormulaRecognition
boolean
| null
See the use_formula_recognition
parameter description in the pipeline object's visual_predict
method. No useChartRecognition
boolean
| null
See the use_chart_recognition
parameter description in the pipeline object's visual_predict
method. No useRegionDetection
boolean
| null
See the use_region_detection
parameter description in the pipeline object's visual_predict
method. No layoutThreshold
number
| object
| null
See the layout_threshold
parameter description in the pipeline object's visual_predict
method. No layoutNms
boolean
| null
See the layout_nms
parameter description in the pipeline object's visual_predict
method. No layoutUnclipRatio
number
| array
| object
| null
See the layout_unclip_ratio
parameter description in the pipeline object's visual_predict
method. No layoutMergeBboxesMode
string
| object
| null
See the layout_merge_bboxes_mode
parameter description in the pipeline object's visual_predict
method. No textDetLimitSideLen
integer
| null
See the text_det_limit_side_len
parameter description in the pipeline object's visual_predict
method. No textDetLimitType
string
| null
See the text_det_limit_type
parameter description in the pipeline object's visual_predict
method. No textDetThresh
number
| null
See the text_det_thresh
parameter description in the pipeline object's visual_predict
method. No textDetBoxThresh
number
| null
See the text_det_box_thresh
parameter description in the pipeline object's visual_predict
method. No textDetUnclipRatio
number
| null
See the text_det_unclip_ratio
parameter description in the pipeline object's visual_predict
method. No textRecScoreThresh
number
| null
See the text_rec_score_thresh
parameter description in the pipeline object's visual_predict
method. No sealDetLimitSideLen
integer
| null
See the seal_det_limit_side_len
parameter description in the pipeline object's visual_predict
method. No sealDetLimitType
string
| null
See the seal_det_limit_type
parameter description in the pipeline object's visual_predict
method. No sealDetThresh
number
| null
See the seal_det_thresh
parameter description in the pipeline object's visual_predict
method. No sealDetBoxThresh
number
| null
See the seal_det_box_thresh
parameter description in the pipeline object's visual_predict
method. No sealDetUnclipRatio
number
| null
See the seal_det_unclip_ratio
parameter description in the pipeline object's visual_predict
method. No sealRecScoreThresh
number
| null
See the seal_rec_score_thresh
parameter description in the pipeline object's visual_predict
method. No useWiredTableCellsTransToHtml
boolean
See the use_wired_table_cells_trans_to_html
parameter description in the pipeline object's visual_predict
method. No useWirelessTableCellsTransToHtml
boolean
See the use_wireless_table_cells_trans_to_html
parameter description in the pipeline object's visual_predict
method. No useTableOrientationClassify
boolean
See the use_table_orientation_classify
parameter description in the pipeline object's visual_predict
method. No useOcrResultsWithTableCells
boolean
See the use_ocr_results_with_table_cells
parameter description in the pipeline object's visual_predict
method. No useE2eWiredTableRecModel
boolean
See the use_e2e_wired_table_rec_model
parameter description in the pipeline object's visual_predict
method. No useE2eWirelessTableRecModel
boolean
See the use_e2e_wireless_table_rec_model
parameter description in the pipeline object's visual_predict
method. No visualize
boolean
| null
Whether to return visualization result images and intermediate images during processing.
true
is passed: return images.false
is passed: do not return images.null
is passed: follow the pipeline config file setting Serving.visualize
.Serving:
visualize: False
By default, images will not be returned; the visualize
parameter in the request body can override this default behavior. If neither the request body nor the config file sets it (or the request body passes null
and the config file does not set it), images will be returned by default. No
result
has the following properties:layoutParsingResults
array
Layout parsing results. The array length is 1 (for image input) or equals the actual number of processed pages (for PDF input). For PDF input, each element corresponds to the result of each processed page in order. dataInfo
object
Input data information.
Each element in layoutParsingResults
is an object
with the following properties:
prunedResult
object
Simplified version of the res
field in the JSON representation of the layout_parsing_result
generated by the pipeline object's visual_predict
method, with input_path
and page_index
fields removed. markdown
object
Markdown result. outputImages
object
| null
See the img
property description in the pipeline prediction results. Images are in JPEG format and Base64 encoded. inputImage
string
| null
Input image. JPEG format, Base64 encoded.
markdown
is an object
with the following properties:
text
string
Markdown text. images
object
Key-value pairs of Markdown image relative paths and Base64 encoded images. isStart
boolean
Whether the first element on the current page is the start of a paragraph. isEnd
boolean
Whether the last element on the current page is the end of a paragraph.
translate
Use a large model to translate documents.
POST /doctrans-translate
markdownList
array
List of Markdown to be translated. Can be obtained from the results of the analyzeImages
operation. The images
attribute will not be used. Yes targetLanguage
string
Please refer to the target_language
parameter description in the translate
method of the pipeline object. No chunkSize
integer
Please refer to the chunk_size
parameter description in the translate
method of the pipeline object. No taskDescription
string
| null
Please refer to the task_description
parameter description in the translate
method of the pipeline object. No outputFormat
string
| null
Please refer to the output_format
parameter description in the translate
method of the pipeline object. No rulesStr
string
| null
Please refer to the rules_str
parameter description in the translate
method of the pipeline object. No fewShotDemoTextContent
string
| null
Please refer to the few_shot_demo_text_content
parameter description in the translate
method of the pipeline object. No fewShotDemoKeyValueList
string
| null
Please refer to the few_shot_demo_key_value_list
parameter description in the translate
method of the pipeline object. No glossary
object
| null
Please refer to the glossary
parameter description in the translate
method of the pipeline object. No llmRequestInterval
number
| null
Please refer to the llm_request_interval
parameter description in the translate
method of the pipeline object. No chatBotConfig
object
| null
Please refer to the chat_bot_config
parameter description in the translate
method of the pipeline object. No
result
in the response body has the following attributes:translationResults
array
Translation results.
Each element in translationResults
is an object
with the following attributes:
language
string
Target language. markdown
object
Markdown result. Object definition is consistent with the markdown
returned by the analyzeImages
operation.
import base64
import pathlib
import pprint
import sys
import requests
API_BASE_URL = "http://127.0.0.1:8080"
file_path = "./demo.jpg"
target_language = "en"
with open(file_path, "rb") as file:
file_bytes = file.read()
file_data = base64.b64encode(file_bytes).decode("ascii")
payload = {
"file": file_data,
"fileType": 1,
}
resp_visual = requests.post(url=f"{API_BASE_URL}/doctrans-visual", json=payload)
if resp_visual.status_code != 200:
print(
f"Request to doctrans-visual failed with status code {resp_visual.status_code}."
)
pprint.pp(resp_visual.json())
sys.exit(1)
result_visual = resp_visual.json()["result"]
markdown_list = []
for i, res in enumerate(result_visual["layoutParsingResults"]):
md_dir = pathlib.Path(f"markdown_{i}")
md_dir.mkdir(exist_ok=True)
(md_dir / "doc.md")
write_text(res["markdown"]["text"])
for img_path, img in res["markdown"]["images"].items():
img_path = md_dir / img_path
img_path.parent.mkdir(parents=True, exist_ok=True)
img_path.write_bytes(base64.b64decode(img))
print(f"The Markdown document to be translated is saved at {md_dir / 'doc.md'}")
del res["markdown"]["images"]
markdown_list.append(res["markdown"])
for img_name, img in res["outputImages"].items():
img_path = f"{img_name}_{i}.jpg"
with open(img_path, "wb") as f:
f.write(base64.b64decode(img))
print(f"Output image saved at {img_path}")
payload = {
"markdownList": markdown_list,
"targetLanguage": target_language,
}
resp_translate = requests.post(url=f"{API_BASE_URL}/doctrans-translate", json=payload)
if resp_translate.status_code != 200:
print(
f"Request to doctrans-translate failed with status code {resp_translate.status_code}."
)
pprint.pprint(resp_translate.json()) # Corrected 'pp' to 'pprint' for proper function call
sys.exit(1)
result_translate = resp_translate.json()["result"]
for i, res in enumerate(result_translate["translationResults"]):
md_dir = pathlib.Path(f"markdown_{i}")
(md_dir / "doc_translated.md").write_text(res["markdown"]["text"])
print(f"Translated markdown document saved at {md_dir / 'doc_translated.md'}")
4. Secondary Development¶
If the default model weights provided by the PP-DocTranslation pipeline do not meet your accuracy or speed requirements in your scenario, you can try to useyour own data from specific domains or application scenariosto furtherfine-tunethe existing model to improve the recognition effect in your scenario.
4.1 Model Fine-tuning¶Since the PP-DocTranslation pipeline contains several modules, if the performance of the model pipeline does not meet expectations, the issue may originate from any one of these modules. You can analyze cases with poor extraction results, use visualized images to determine which module has the problem, and refer to the corresponding fine-tuning tutorial links in the following table to fine-tune the model.
Scenario Fine-tuning module Fine-tuning reference link Inaccurate detection of layout areas, such as failure to detect seals and tables Layout detection module Link Inaccurate recognition of table structures Table structure recognition module Link Inaccurate recognition of formulas Formula recognition module Link Omission in detecting seal texts Seal text detection module Link Omission in detecting texts Text detection module Link Inaccurate text content Text recognition module Link Inaccurate correction of vertical or rotated text lines Text line orientation classification module Link Inaccurate correction of whole image rotation Document image orientation classification module Link Inaccurate correction of image distortion Text image unwarping module Fine-tuning is temporarily not supported 4.2 Model Application¶After completing fine-tuning training with your private dataset, you can obtain a local model weight file. Then, you can use the fine-tuned model weights by customizing the pipeline configuration file.
You can call the export_paddlex_config_to_yaml
method of the PP-DocTranslation pipeline object in PaddleOCR to export the current pipeline configuration to a YAML file:
from paddleocr import PPDocTranslation
pipeline = PPDocTranslation()
pipeline.export_paddlex_config_to_yaml("PP-DocTranslation.yaml")
After obtaining the default pipeline configuration file, replace the local path of the fine-tuned model weights with the corresponding location in the pipeline configuration file. For example,
......
SubModules:
TextDetection:
module_name: text_detection
model_name: PP-OCRv5_server_det
model_dir: null # Replace with the path to the weights of the fine-tuned text detection model
limit_side_len: 960
limit_type: max
thresh: 0.3
box_thresh: 0.6
unclip_ratio: 1.5
TextRecognition:
module_name: text_recognition
model_name: PP-OCRv5_server_rec
model_dir: null # Replace with the path to the weights of the fine-tuned text recognition model
batch_size: 1
score_thresh: 0
......
The pipeline configuration file not only includes parameters supported by PaddleOCR CLI and Python API but also allows for more advanced configurations. Detailed information can be found in the corresponding pipeline usage tutorial in the Overview of PaddleX Model Pipeline Usage. Refer to the detailed instructions therein and adjust the configurations according to your needs.
After modifying the configuration file, specify the path to the modified pipeline configuration file using the --paddlex_config
parameter in the command line. PaddleOCR will then read its contents as the pipeline configuration. Here is an example:
paddleocr pp_doctranslation --paddlex_config PP-DocTranslation.yaml ...
When initializing the pipeline object, you can pass the path of the PaddleX pipeline configuration file or a configuration dict through the paddlex_config
parameter, and PaddleOCR will read its content as the pipeline configuration. The example is as follows:
from paddleocr import PPDocTranslation
pipeline = PPDocTranslation(paddlex_config="PP-DocTranslation.yaml")
August 18, 2025 August 18, 2025
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4