RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://patents.google.com/patent/CN113962842B/en below:

CN113962842B - Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit

CN113962842B - Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit - Google PatentsDynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit Download PDF Info

Publication number: CN113962842B
Authority: CN; China
Prior art keywords: video; module; data; despinning; axi
Prior art date: 2021-10-20
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Active

Application number

CN202111223132.7A

Other languages

Chinese (zh)

Other versions

CN113962842A (en

Inventor

å¼ å¼

å®åæ³¢

æ¨ä¸å¸

é¢ä¸é

è¢ä¸

ææäº®

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Beihang University

Original Assignee

Beihang University

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2021-10-20

Filing date

2021-10-20

Publication date

2022-12-09

2021-10-20 Application filed by Beihang University filed Critical Beihang University

2021-10-20 Priority to CN202111223132.7A priority Critical patent/CN113962842B/en

2022-01-21 Publication of CN113962842A publication Critical patent/CN113962842A/en

2022-12-09 Application granted granted Critical

2022-12-09 Publication of CN113962842B publication Critical patent/CN113962842B/en

Status Active legal-status Critical Current

2041-10-20 Anticipated expiration legal-status Critical

Links

Images Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/12—Indexing scheme for image data processing or generation, in general involving antialiasing
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware

Landscapes

Engineering & Computer Science (AREA)
Theoretical Computer Science (AREA)
Physics & Mathematics (AREA)
General Physics & Mathematics (AREA)
Computing Systems (AREA)
Multimedia (AREA)
Signal Processing (AREA)
Image Processing (AREA)

Abstract

The invention relates to a dynamic stepless despin system and a method based on high-level synthesis of a large-scale integrated circuit, which comprises a video acquisition module, a video decoding module, a video storage module, a data communication module, a video coding module, a dynamic stepless despin module and a pixel combination module (namely a four-in-one module) which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate. The invention adopts a high-level comprehensive technology to realize a dynamic stepless despinning function, can perform real-time despinning treatment on the acquired video image in a photoelectric platform, fully utilizes the characteristics of parallel acceleration and pipeline optimization of an FPGA (field programmable gate array), and has the excellent characteristics of high video resolution, large despinning range, high despinning precision, clear and non-sawtooth processed image, low output delay, strong system stability, easy processing, low power consumption, small volume and the like.

Description Translated from Chinese ä¸ç§åºäºå¤§è§æ¨¡éæçµè·¯é«å±æ¬¡ç»¼åçå¨ææ ææ¶æç³»ç»å æ¹æ³A dynamic stepless derotation system based on high-level synthesis of large-scale integrated circuits and its method

ææ¯é¢åtechnical field

æ¬åææ¶åæºè½ååµå¥å¼è§é¢å¤çé¢åï¼å·ä½æ¶åçæ¯ä¸ç§åºäºå¤§è§æ¨¡éæçµè·¯é«å±æ¬¡ç»¼åçå¨ææ ææ¶æç³»ç»åæ¹æ³ãThe invention relates to the field of intelligent embedded video processing, in particular to a dynamic stepless derotation system and method based on high-level synthesis of large-scale integrated circuits.

èæ¯ææ¯Background technique

å¨æºè½½åè±çµè§æå½ä¸çåè¿ç¨å½ä¸ï¼çµè§çå¤æ¡æ¶ç»ææ æ³é¿åçä¼åçæ¨ªæ»è¿å¨ï¼è¿ä¼é æåå¦ç³»ç»ç¸å¯¹äºè½½æºåçç¸å¯¹è¿å¨ï¼ä»èé æå¾åæè½¬ï¼æèæ¯å¨æææºé£è¡è¿ç¨ä¸ï¼æºèº«æ¶å¸¸è¿è¡å¤§è§åº¦ç¿»æ»(çè³å¯è¾¾360Â°)ï¼ä»èé æçµè§ç»é¢åçå¤§è§åº¦æè½¬ï¼ä¸¥éå½±åæä½äººåè§æãå æ¤å¨ä¼å¤åå¦çåå¨ä»¶æåçµåè±ç³»ç»å½ä¸ï¼ä¸ºäºæ¶é¤é£è¡å¨å§¿æååèå¼åçå¾åæè½¬é®é¢ï¼éè¦å¯¹çµè§ç³»ç»è·åå°çåå§è§é¢å¾åè¿è¡åæè½¬å¤çï¼å³æ¶æåæ¢ï¼ä»¥æ¤ä¿è¯å¾åçæ£å¸¸å¹³ç¨³ï¼ä¾¿äºæä½äººåè§å¯ååæçç®æ æ£æµè¯å«ä¸è·è¸ªå·¥ä½ãç®åå¨å®éå·¥ç¨åºç¨ä¸æä¸ç§å¸¸è§çæ¶ææ¹å¼ï¼å³çµåæ¶æãåå¦æ¶æåç©çæ¶æï¼åå¦æ¶ææ¯ç®åä½¿ç¨æå¤çææ®µï¼å¶éè¿æè½¬æååè·¯ä¸çæ¶ææ£±éæ¥æ ¡æ£å¾åï¼è½ç¶è¿ç§æ¹å¼å»¶è¿ä½ãååºéåº¦å¿«ï¼ä½æ¯å¶å å·¥å·¥èºå¤æãæ¶æè§ç²¾åº¦ä½ä¸ç³»ç»ä½ç§¯ååèå¾å¤§ãç®åéçå¤§è§æ¨¡éæçµè·¯åæ°åä¿¡å·å¤çææ¯çé£éåå±ï¼éè¿å®æ¶çè§é¢å¾åå¤çç®æ³å®ç°ççµåæ¶æææ¯æä¸ºäºç®åçä¸»æµç ç©¶æ¹åï¼è¿ç§æ¹å¼åæäºåå¦æ¶æç³»ç»çä¸è¿°ä¸è¶³ï¼å¾å°äºè¶æ¥è¶å¹¿æ³çåºç¨ãDuring the video recording and aiming process of the airborne pod, the outer frame structure of the TV will inevitably undergo rolling motion, which will cause relative movement of the optical system relative to the carrier aircraft, resulting in image rotation; During the process, the fuselage often rolls at a large angle (even up to 360Â°), which causes the TV screen to rotate at a large angle, which seriously affects the perception of the operator. Therefore, in many optical sighting devices or photoelectric pod systems, in order to eliminate the image rotation problem caused by the attitude change of the aircraft, it is necessary to perform anti-rotation processing on the original video image acquired by the TV system, that is, derotation transformation, so as to ensure the accuracy of the image. It is normal and stable, which is convenient for operators to observe and later target detection, identification and tracking. At present, there are three common derotation methods in practical engineering applications, namely electronic derotation, optical derotation and physical derotation. Optical derotation is currently the most used method, which corrects the image by rotating the derotation prism in the imaging optical path , although this method has low delay and fast response speed, but its processing technology is complex, the precision of race-rotation angle is low, and the system volume and power consumption are large. At present, with the rapid development of large-scale integrated circuits and digital signal processing technology, the electronic derotation technology realized by real-time video image processing algorithm has become the current mainstream research direction. This method overcomes the above-mentioned shortcomings of the optical derotation system. has been more and more widely used.

éçè®¡ç®æºè§è§é¢åçä¸æåå±ååç±»å¤çè¯çæ§è½çä¸ææåï¼åºäºè§é¢å¾åå¤çççµåæ¶æææ¯æä¸ºäºå½ååç±»æ¶æææ¯çä¸»æµç ç©¶æ¹åï¼éè¿çµåæ¶ææ¶é¤å é£è¡å¨å§¿æååèå¼åçå¾åæè½¬é®é¢æä¸ºç®åå·¥ç¨åºç¨çé¦éãWith the continuous development of the field of computer vision and the continuous improvement of the performance of various processing chips, the electronic derotation technology based on video image processing has become the mainstream research direction of various derotation technologies. The resulting image rotation problem has become the first choice for engineering applications.

åæåå®¹Contents of the invention

æ¬åæææ¯è§£å³é®é¢ï¼åæç°æææ¯çä¸è¶³ï¼æä¾ä¸ç§åºäºå¤§è§æ¨¡éæçµè·¯é«å±æ¬¡ç»¼åçå¨ææ ææ¶æç³»ç»åæ¹æ³ï¼åºäºé«å±æ¬¡ç»¼åææ¯ï¼å©ç¨FPGAå¹¶è¡å éåæµæ°´çº¿ä¼åçç¹ç¹å¯ä»¥å®ç°é«ç²¾åº¦ãå¤§èå´ãé«å®æ¶æ§ãé«è¾åºå¾åè´¨éçæ ææ¶æå¤çãç²¾åº¦å¯è¾¾0.001Â°ï¼å³å¯ä»¥å¯¹æå°çè§åº¦ææçè¿è¡æ¶æå¤çï¼æ¶æèå´ä¸º0-360Â°ï¼å³å¯ä»¥å¯¹ä»»æè§åº¦åæ¶æå¤çï¼ä¸å¸§å¾åå¤çæ¶é´å°äº12msï¼æ¢å¯ä»¥å®ç°å®æ¶æ¶æå¤çï¼éç¨åçº¿æ§æå¼æ³è¿è¡æ¶æå¤çï¼å æ¤å¾ååæ»æ é¯é½¿ï¼å¾åè¾åºè´¨éè¾é«ãéäºç®åææ¯åå¨å®æ¶æ§ä¸ç²¾åº¦ãèå´ãå¾åè´¨éä¹é´ççç¾ï¼å èç°æææ¯ä»å¯ä»¥åç¬å®ç°ä»¥ä¸ææ ä¸çä¸ä¸ªæå ä¸ªï¼æªè½åæ¶å®ç°ä¸è¿°å¨é¨ææ¯ææ ï¼å æ¤æ¬åæå·æå¾é«çå·¥ç¨åºç¨ä»·å¼ãThe technical problem of the present invention is to overcome the deficiencies of the prior art and provide a dynamic stepless derotation system and method based on high-level synthesis of large-scale integrated circuits. Based on high-level synthesis technology, it can be realized by utilizing the characteristics of FPGA parallel acceleration and pipeline optimization. Stepless derotation processing with high precision, wide range, high real-time performance, and high output image quality. The precision can reach 0.001Â°, that is, the derotation processing can be performed sensitively to a very small angle; the derotation range is 0-360Â°, that is, it can be derotation processing for any angle; Real-time derotation processing; bilinear interpolation method is used for derotation processing, so the image is smooth and jagged, and the image output quality is high. Due to the contradiction between real-time performance, accuracy, range and image quality in the current technology, the prior art can only realize one or several of the above indicators alone, but fails to realize all the above technical indicators at the same time, so the present invention has high engineering application value.

æ¬åæçææ¯è§£å³æ¹æ¡ï¼ä¸ç§åºäºå¤§è§æ¨¡éæçµè·¯é«å±æ¬¡ç»¼åçå¨ææ ææ¶æç³»ç»ï¼åºäºå¤§è§æ¨¡éæçµè·¯é«å±æ¬¡ç»¼åæ¹æ³è®¾è®¡çï¼ä½ä¸ºæ¬åæçæ ¸å¿å¶æ»ä½ä¸å·æå¦ä¸åæ°ç¹ï¼1)å©ç¨é«å±æ¬¡ç»¼åææ¯ï¼å³ä½¿ç¨C++çé«çº§è¯è¨è¿è¡FPGAç®æ³è®¾è®¡ä¼ååèµæºè°åº¦ï¼2)ç®æ³æµæ°´çº¿å éä¼åï¼æé«äºæ°æ®ååéï¼å¤§å¹éä½å»¶æ¶ï¼æé«å¾åæ¶æçå®æ¶æ§ï¼3)å¤AXIæ»çº¿é«å¸¦å®½å®æ¶å¹¶è¡ä¼åï¼æé«æ°æ®è¯»åæçï¼æé«ç®æ³å®æ¶æ§ï¼4)è®¾è®¡ç¨äºååç´ åå¹¶çååä¸æ¨¡åï¼å³å°ç¨äºåçº¿æ§æå¼çåä¸ª8ä½åç´ ç¹åå¹¶æä¸ä¸ª32ä½æ°æ®ï¼å¯å¨åæå®ç°ä¸æ¬¡è¯»ååä¸ªåç´ ç¹çåè½ï¼å¤§å¹åå°å æ°æ®å¤æ¬¡è¯»åèå¸¦æ¥çé«å»¶æ¶ãTechnical solution of the present invention: a dynamic stepless derotation system based on high-level synthesis of large-scale integrated circuits, designed based on high-level synthesis methods of large-scale integrated circuits, as the core of the present invention, it generally has the following innovations: 1 ) Utilize high-level synthesis technology, that is, use C++ and other high-level languages to optimize FPGA algorithm design and resource scheduling; 2) Accelerate optimization of algorithm pipeline, improve data throughput, greatly reduce delay, and improve real-time performance of image derotation; 3) Multi-AXI bus high-bandwidth real-time parallel optimization, improving data read and write efficiency, and improving algorithm real-time performance; 4) A four-in-one module designed for four-pixel merging, which combines four 8-bit pixels for bilinear interpolation into A 32-bit data can realize the function of reading four pixels at a time in the later stage, greatly reducing the high delay caused by multiple reading of data.

æ¬åææè¿°ç³»ç»åæ¬è§é¢ééæ¨¡åãè§é¢è§£ç æ¨¡åãæ ¸å¿å¤çæ¨¡ååè§é¢ç¼ç æ¨¡åï¼æè¿°æ ¸å¿å¤çæ¨¡åéç¨FPGA+ARMæ¶æçå¼æçä¸ç³»ç»ï¼ä¸ºZynq UltraScale+MPSoC15EGè¯çï¼æè¿°FPGAåæ¬å¨ææ ææ¶ææ¨¡åãè§é¢è½¬AXIæ»çº¿è§é¢æµæ¨¡åãAXIè§é¢æµDDRè¯»åæ¨¡åä»¥åæ¬åæä¸ç¨äºéä½ç®æ³å»¶è¿ãæé«æ»çº¿å¸¦å®½å©ç¨çèåæ°è®¾è®¡çåç´ åå¹¶æ¨¡åå³ååä¸æ¨¡åï¼ARMåæ¬è§é¢åå¨æ¨¡åDDRåRS422ä¸²å£éä¿¡æ¨¡åï¼FPGAä¸ARMä¹é´çæ°æ®éä¿¡éç¨AXIæ§å¶æ»çº¿è¿è¡ï¼The system of the present invention includes a video acquisition module, a video decoding module, a core processing module and a video encoding module; the core processing module adopts a heterogeneous system-on-chip of FPGA+ARM architecture, which is a Zynq UltraScale+MPSoC15EG chip; the FPGA includes a dynamic Infinite derotation module, video to AXI bus video stream module, AXI video stream DDR read and write module, and the innovatively designed pixel combination module for reducing algorithm delay and improving bus bandwidth utilization in the present invention is a four-in-one module; ARM includes Video storage module DDR and RS422 serial port communication module, data communication between FPGA and ARM is carried out by AXI control bus;

è§é¢ééæ¨¡åï¼ä½¿ç¨ç¸æºè¿è¡åå§è§é¢å¾åçééï¼è¯¥è§é¢å¾åå³ä¸ºå¾æ¶æå¤ççæ°æ®ï¼å®æééååå§è§é¢å¾åè¿å¥è§é¢è§£ç æ¨¡åä¸ï¼The video acquisition module uses the camera to collect the original video image, which is the data to be derotated; after the acquisition is completed, the original video image enters the video decoding module;

è§é¢è§£ç æ¨¡åï¼å°ç¸æºééçä¸²è¡è§é¢è½¬æ¢æå¹¶è¡è§é¢æ°æ®ï¼å¹¶å¾å°ä¸ç³»åæ¾æ§çè§é¢åæ¥ä¿¡å·ï¼è§£ç å¾å°çå¹¶è¡è§é¢æ°æ®ååæ¥ä¿¡å·éå¥è³FPGAï¼The video decoding module converts the serial video collected by the camera into parallel video data, and obtains a series of explicit video synchronization signals, and sends the decoded parallel video data and synchronization signals to the FPGA;

FPGAä¸ï¼é¦åç»è¿è§é¢è½¬AXIæ»çº¿è§é¢æµæ¨¡åå°è§é¢æ°æ®è½¬åä¸ºå»¶è¿æ´ä½æ´å©äºå®ç°æ°æ®åæ¥ä¸æµæ°´çº¿å éä¼åçAXIæ»çº¿è§é¢æµæ°æ®ãæ¥çAXIæ»çº¿è§é¢æµæ ¼å¼çæ°æ®æµå¥æ¬åæåæ°è®¾è®¡çååä¸æ¨¡åä¸ï¼ç±äºéåè¦è¿è¡åçº¿æ§æå¼çæ¶æå¤çï¼æ¯å¤çä¸ä¸ªåç´ è¦ä»DDRä¸è¯»åå¶ç´§é»çåä¸ªåç´ ï¼åç´ è¯»åæå¸¦æ¥çå»¶æ¶æ¯ååå¯è§çï¼èå¤æ¬¡çåç´ è¯»åå¿å¿é ææ´é«çå»¶æ¶ï¼å æ¤æ¬åæè®¾è®¡ååä¸æ¨¡åï¼å³æ°æ®æµæ¯æµå¥ä¸¤è¡å°±å°å¶ç¼åå¨çåé«éç¼åä¸ï¼å¹¶å°æ¯ä¸ªåç´ å¨å´çåä¸ª8ä½åç´ ç¹åå¹¶æä¸ä¸ª32ä½æ°æ®ï¼åç»éè¦è¯»åæä¸åç´ ç´§é»çåä¸ªåç´ æ¶ï¼ä»éè¯»åä¸æ¬¡åå¹¶åç32ä½åç´ ï¼å¹¶å°å¶åå²æåä¸ªç¬ç«ç8ä½æ°æ®ï¼å³å¯å®ç°ä¸æ¬¡è¯»ååä¸ªåç´ ç¹çåè½ï¼è¿ä¸å¤çå¯ååå©ç¨AXIæ»çº¿å¸¦å®½å°å»¶æ¶éè³åæ¥çååä¹ä¸ãæ¥çå°æè¿°åå¹¶åç32ä½è§é¢æµæ°æ®éè¿AXIè§é¢æµDDRè¯»åæ¨¡åç¼åè¿ARMçDDRä¸ï¼In the FPGA, firstly, the video data is converted into AXI bus video stream data with lower delay and optimized for data synchronization and pipeline acceleration through the video-to-AXI bus video stream module. Then the data of the AXI bus video stream format flows into the four-in-one module of the innovative design of the present invention, because the derotation processing of bilinear interpolation will be carried out subsequently, each processing a pixel will read its adjacent four pixels from the DDR, The delay caused by pixel reading is very considerable, and multiple pixel reads will inevitably cause higher delays. Therefore, the present invention designs a four-in-one module, that is, every time the data stream flows into two lines, it is cached in the In the on-chip cache, the four 8-bit pixels around each pixel are combined into a 32-bit data. When you need to read the four pixels next to a certain pixel, you only need to read the combined 32-bit data once. Pixel, and divide it into four independent 8-bit data, can realize the function of reading four pixels at a time, this processing can make full use of the AXI bus bandwidth to reduce the delay to a quarter of the original. Then the 32-bit video stream data after the merging is cached in the DDR of ARM by the AXI video stream DDR read-write module;

å¨ææ ææ¶ææ¨¡åï¼æ ¹æ®ä¸ä½æºéè¿RS422ä¸²å£éä¿¡æ¨¡ååéçæ¶ææä»¤åæ¶æè§åº¦å¯¹ç¼åå¨DDRä¸çè§é¢æ°æ®æµä¸çè§é¢æ°æ®ä½å¨ææ ææ¶æï¼æ¶æå¤çæ¶éååè¿°ååä¸æ¨¡åï¼å°ä»DDRä¸è¯»åç32ä½æ°æ®åå²ä¸ºåä¸ª8ä½æ°æ®è¿è¡åçº¿æ§æå¼ï¼å¤çåçè§é¢å¾åä»ä¿åå¨DDRä¸ï¼åæ¬¡å©ç¨AXIè§é¢æµDDRè¯»åæ¨¡åä»DDRä¸å°ç¼åçæ¶æåçè§é¢å¾åè¯»åºå°AXIè§é¢æµä¸ï¼å¹¶å©ç¨AXIæ»çº¿è§é¢æµè½¬è§é¢æ¨¡åå°AXIè§é¢æµè½¬åä¸ºå¸¦ææ¾æ§åæ¥ä¿¡å·çå¹¶è¡è§é¢æ°æ®ï¼å¹¶å°å¹¶è¡è§é¢æ°æ®éå¥è§é¢ç¼ç æ¨¡åè¿è¡ç¼ç è¾åºè³æ¾ç¤ºå¨æééå¡è¿è¡å®æ¶æ¾ç¤ºãThe dynamic stepless derotation module performs dynamic stepless derotation on the video data in the video data stream buffered in the DDR according to the derotation command and the derotation angle sent by the host computer through the RS422 serial communication module. One module divides the 32-bit data read from the DDR into four 8-bit data for bilinear interpolation, and the processed video image is still stored in the DDR; again, the AXI video stream DDR read-write module is used to read and write the data from the DDR The buffered derotated video image is read out to the AXI video stream, and the AXI bus video stream to video module is used to convert the AXI video stream into parallel video data with explicit synchronization signals, and the parallel video data is sent to video encoding The module encodes and outputs to the monitor or acquisition card for real-time display.

æç¨åºäºåçº¿æ§æå¼çå¾åçµåæ¶æç®æ³å·ä½å¦ä¸ï¼The image electronic derotation algorithm based on bilinear interpolation is as follows:

(1)æ ¹æ®ä¸ä½æºåæ¥çæ¶æè§åº¦æ±æ¶æå¤çåå¾åçæ¯ä¸ªåç´ ç¹(xâ²,yâ²)å¯¹åºæ¶æå¤çåå¾ååç´ ç¹çåæ (x,y)ãå¬å¼å¦ä¸ï¼(1) Calculate the coordinates (x, y) of each pixel point (x', y') of the image after derotation processing corresponding to the pixel point of the image before derotation processing according to the derotation angle sent by the host computer. The formula is as follows:

å¶ä¸ï¼Î¸ä¸ºæè½¬è§åº¦ï¼

ä¸ºæè½¬ç©éµãAmong them, Î¸ is the rotation angle, is the rotation matrix.

ä¸è¬è®¾å®ä»¥å¾åä¸å¿(x₀,y₀)ä¸ºæè½¬ä¸å¿è¿è¡æè½¬ï¼ä¸è¿°å¬å¼åºæ¹åä¸ºï¼It is generally set to rotate with the image center (x ₀ , y ₀ ) as the rotation center, the above formula should be rewritten as:

å°ä¸è¿°å¬å¼åä¸ºæ éå½¢å¼ä¸ºï¼The above formula can be written in scalar form as:

(2)éç¨åçº¿æ§æå¼æ³è¿è¡åç´ æ å°ãç±äºæ¥éª¤(1)ä¸è®¡ç®å¾å°çæ å°å°åå¾åçåç´ ç¹åæ (x,y)å¾å¾ä¸æ¯æ´æ°ï¼å æ¤æ æ³ç´æ¥æç§ä¸å¯¹ä¸çå³ç³»è¿è¡åç´ æ å°ãä¸è¬éç¨ééæ ·çæ¹å¼æ¥è§£å³æ å°è¿ç¨ä¸åºç°çéæ´æ°åç´ åæ é®é¢ã(2) Using bilinear interpolation method for pixel mapping. Since the pixel coordinates (x, y) mapped to the original image calculated in step (1) are often not integers, it is impossible to directly perform pixel mapping according to a one-to-one relationship. Generally, resampling is used to solve the problem of non-integer pixel coordinates in the mapping process.

æ ¹æ®å¾åéå»ºçè®ºï¼ä¸è¬éç¨ä¸ç§å¸¸è§çæå¼æ¹å¼è¿è¡å¾åæ å°ï¼æè¿é»æå¼æ³ãåçº¿æ§æå¼æ³åä¸æ¬¡åææ³ãæè¿é»æå¼æ³çæå¼ææè¾å·®ï¼æ¶æåçå¾åæææ¾çé¯é½¿æåºåæ¯åºç°è±¡ï¼åçº¿æ§æå¼æ³åä¸æ¬¡åææ³ææè¾å¥½ï¼ç°åº¦è¿ç»æ é¯é½¿ãç±äºä¸æ¬¡åææ³ç®æ³å¤æï¼è®¡ç®æ¶é´è¿é¿ï¼å¯¼è´å¶å¨å®éå·¥ç¨åºç¨ä¸å¾é¾è¾¾å°å®æ¶æ§è¦æ±ãå æ¤åºäºå¯¹æ¶æç²¾åº¦åç³»ç»å®æ¶æ§çæè¡·èèï¼æ¬åææç»éæ©ä½¿ç¨åºäºåçº¿æ§æå¼æ³çå¾åæ¶æç®æ³ãAccording to image reconstruction theory, three common interpolation methods are generally used for image mapping: nearest neighbor interpolation, bilinear interpolation and cubic interpolation. The interpolation effect of the nearest neighbor interpolation method is poor, and the image after derotation has obvious jagged effects and glitches; the bilinear interpolation method and the cubic interpolation method have better effects, and the gray scale is continuous without jagged. Due to the complexity of the cubic interpolation algorithm and the long calculation time, it is difficult to meet the real-time requirements in practical engineering applications. Therefore, considering the trade-off between the precision of derotation and the real-time performance of the system, the present invention finally chooses to use the image derotation algorithm based on the bilinear interpolation method.

åºäºåçº¿æ§æå¼æ³ççµåæ¶æç®æ³åçç¤ºæå¾å¦å¾2æç¤ºãè¯¥æ¹æ³æ ¹æ®éæ´æ°éæ ·ç¹æ´æ°åæ ç¹å¨å´4ä¸ªç¹çç°åº¦å¼å¨xåyä¸¤ä¸ªæ¹åä¸è¿è¡çº¿æ§æå¼ãå¨éå¾2ä¸ï¼(x,y)ä¸ºåçº¿æ§æå¼å¾å°çåç´ åæ ï¼f(x,y)ä¸ºåæ (x,y)å¤çåç´ ç°åº¦å¼ï¼f(0,0),f(1,0),f(0,1),f(1,1)ä¸º(x,y)å¨å´4ç¹çåç´ ç°åº¦å¼ï¼ç±æ¤å¯æ±å¾åçº¿æ§æå¼æ³çè®¡ç®å¬å¼å¦ä¸ï¼The schematic diagram of the electronic derotation algorithm based on the bilinear interpolation method is shown in Fig. 2 . This method performs linear interpolation in the x and y directions according to the gray values of the four points around the integer coordinate point of the non-integer sampling point. In Figure 2, (x, y) is the pixel coordinate obtained by bilinear interpolation, f(x, y) is the gray value of the pixel at the coordinate (x, y), f(0,0), f( 1,0), f(0,1), f(1,1) are the pixel gray values of 4 points around (x, y), from which the calculation formula of the bilinear interpolation method can be obtained as follows:

f(x,y)ï¼[f(1,0)-f(0,0)]x+[f(0,1)-f(0,0)]y+[f(1,1)-f(1,0)-f(0,1)-f(0,0)]xy+f(0,0)f(x,y)=[f(1,0)-f(0,0)]x+[f(0,1)-f(0,0)]y+[f(1,1)-f(1 ,0)-f(0,1)-f(0,0)]xy+f(0,0)

(3)ç¡®å®æ¶æåå¾åè¾¹çãæè½¬åçå¾åå¤§å°ç¸æ¯äºæè½¬åä¸è¬é½ä¼ææ¹åï¼å æ¤éè¦éæ°ç¡®å®å¾åè¾¹çãå¾åä¸ãä¸ãå·¦ãå³åä¸ªè¾¹çä½ç½®çç¡®å®æç§å¦ä¸å¬å¼è¿è¡è®¡ç®ï¼(3) Determine the image boundary after derotation. The size of the image after rotation will generally change compared with that before rotation, so the image boundary needs to be re-determined. The determination of the four boundary positions of the upper, lower, left and right of the image is calculated according to the following formula:

leftï¼max(x₁,x₂,x₃,x₄)leftï¼max(x ₁ ,x ₂ ,x ₃ ,x ₄ )

rightï¼min(x₁,x₂,x₃,x₄)right=min(x ₁ ,x ₂ ,x ₃ ,x ₄ )

topï¼max(y₁,y₂,y₃,y₄)top=max(y ₁ ,y ₂ ,y ₃ ,y ₄ )

bottomï¼min(y₁,y₂,y₃,y₄)bottom=min(y ₁ ,y ₂ ,y ₃ ,y ₄ )

(4)åºå®å¾ååè¾¨çãå¨å®éå·¥ç¨åºç¨ä¸ï¼è¾åºå¾ååè¾¨çå¾å¾æ¯åºå®ä¸åçï¼èå¨éå¯¹åå§è§é¢å¾åè¿è¡ä¸åæ¶æè§çæ¶ææä½åï¼å¾ååè¾¨çå¿å®ä¼åçæ¹åä¸åè¾¨çå¤§å°æ æ³åºå®ï¼å æ¤æ¬åæéå¯¹æ¶æåçå¾åä»¥å¾åä¸å¿ä¸ºä¸å¿è¿è¡åªè£ï¼åºå®è¾åºå¾ååè¾¨çï¼å³ä¿æç¸åå¤§å°çè¾åºå¾åã(4) Fixed image resolution. In practical engineering applications, the resolution of the output image is often fixed, but after performing derotation operations with different derotation angles on the original video image, the image resolution will definitely change and the resolution cannot be fixed. The invention aims at clipping the derotated image with the center of the image as the center, and fixes the resolution of the output image, that is, maintains the same size of the output image.

æ¬åæçéç¹å¨åºäºå¤§è§æ¨¡éæçµè·¯é«å±ç»¼åææ¯æ¥å®ç°å¨ææ ææ¶æï¼è¿æ¯é«åè¾¨çç³»ç»å®æ¶æ§çéè¦ä¿éï¼ä¹æ¯æ¬åææéè¦çåæ°ç¹ãThe focus of the present invention is to realize the dynamic stepless derotation based on the high-level integrated technology of large-scale integrated circuits, which is an important guarantee for the real-time performance of the high-resolution system, and is also the most important innovation point of the present invention.

æ¬åæä¸ç°æææ¯ç¸æ¯çä¼ç¹å¨äºï¼The advantage of the present invention compared with prior art is:

(1)æ¬åæåæ°çè®¾è®¡äºååä¸æ¨¡åï¼å³ååå©ç¨é«å¸¦å®½æ°æ®æµæ°´çä¼å¿ï¼æ°æ®æµæ¯æµå¥ä¸¤è¡å°±å°å¶ç¼åå¨çåé«éç¼åä¸ï¼å¹¶å°æ¯ä¸ªåç´ å¨å´çåä¸ª8ä½åç´ ç¹åå¹¶æä¸ä¸ª32ä½æ°æ®ï¼å¹¶éè¿æ°æ®æµæ°´çæ¹å¼ç¼åè¿DDRä¸ï¼ä¹åå¯¹æåç´ ç¹è¿è¡åçº¿æ§æå¼çæ¶ææ¶ï¼å¯å°è¿32ä½æ°æ®ååºå¹¶åå²æåä¸ª8ä½çåç´ ç¹å³ä¸ºå¶åçº¿æ§æå¼æéç¨å°çåä¸ªåç´ ç¹ï¼å³å¯å®ç°ä¸æ¬¡è¯»ååä¸ªåç´ ç¹çåè½ï¼ç±æ¤å¯å°ç®æ³å»¶æ¶éè³åæ¥çååä¹ä¸ï¼å¤çå»¶æ¶ä¸æè¿é»æå¼æ¶æå¤çç¸åï¼ä½å¤çææå´æ¯æè¿é»æå¼æ¶æå¥½å¾å¤ã(1) The present invention has innovatively designed a four-in-one module, which takes full advantage of the advantages of high-bandwidth data streams. Every time a data stream flows into two lines, it will be cached in the on-chip cache, and four pixels around each pixel will be cached. The 8-bit pixels are merged into a 32-bit data, which is buffered into the DDR through data pipeline, and then the 32-bit data can be taken out and divided into four 8-bit data when performing bilinear interpolation derotation on a pixel One-bit pixels are the four pixels needed for its bilinear interpolation, which can realize the function of reading four pixels at a time, thus reducing the algorithm delay to a quarter of the original, The processing delay is the same as that of the nearest neighbor interpolation derotation, but the processing effect is much better than that of the nearest neighbor interpolation derotation.

(2)ç®æ³æµæ°´çº¿å éä¼åãå¤§è§æ¨¡éæçµè·¯FPGAç¸æ¯äºä¸è¬åµå¥å¼ç³»ç»çä¸å¤§ä¼å¿æ¯å¯ä»¥ä»¥æ°æ®æµæ°´çæ¹å¼å¯¹ç®æ³è¿è¡ä¼åï¼å æ¤æ¬åæéç¨æµæ°´çº¿çæ¹å¼ç¼åç®æ³ï¼å¨Vivado HLSå¼åå·¥å·ä¸è¿è¡ç®æ³å¼åæ¶ï¼éè¿ä½¿ç¨é¢ç¼è¯æä»¤pipeline(æµæ°´çº¿ä¼åæä»¤)ï¼å¹¶ä¿è¯ç¼åçç¨åºç¬¦åæ°æ®ä¸æ¬¡è¾å¥ãä¸æ¬¡ä½¿ç¨åä¸æ¬¡è¾åºï¼å³ä¸ä¸ªæ°æ®åªè½è¾å¥ä¸æ¬¡ï¼ä¸åªè½ä½¿ç¨ä¸æ¬¡ï¼æç»å¿é¡»è¾åºä¸åªè½è¾åºä¸æ¬¡çæµæ°´çº¿ç¼ç¨ååä»¥é²æ¢æ°æ®æµå µå¡ï¼å³å¯ä»¥ä»¥çºç²ç¡¬ä»¶é»è¾èµæºçæ¹å¼å¯¹ç®æ³è¿è¡æµæ°´çº¿åå¤çã(2) Algorithm pipeline acceleration optimization. Compared with the general embedded system, the large-scale integrated circuit FPGA has a great advantage that the algorithm can be optimized in the form of data pipeline, so the present invention uses the pipeline method to write the algorithm, and when developing the algorithm in the Vivado HLS development tool, through Use the precompiled instruction pipeline (pipeline optimization instruction), and ensure that the written program conforms to the data input, use and output once, that is, a data can only be input once, and can only be used once, and finally must be output and can only be output once The principle of pipeline programming is to prevent data flow from being blocked, that is, algorithms can be pipelined at the expense of hardware logic resources.

å·ä½èè¨æµæ°´çº¿ååè®¸å¹¶è¡æ§è¡æä½ï¼æ¯ä¸ªæ§è¡æ¥éª¤æ éçå¾å®ææææä½ååå¼å§ä¸ä¸é¡¹æä½ãæµæ°´çº¿åéç¨äºå½æ°åå¾ªç¯ï¼ä»¥å¾ªç¯æµæ°´çº¿ä¼åä¸ºä¾ï¼æ¯è½®å¾ªç¯ä¸çåéæ¶åè¯»ãè®¡ç®ååä¸ä¸ªæä½ï¼æªè¿è¡æµæ°´çº¿ä¼ååï¼è¿ä¸ä¸ªæä½æç§ä¸²è¡é¡ºåºæ§è¡ï¼æ¯é3ä¸ªæ¶éå¨æè¯»åä¸æ¬¡è¾å¥ï¼å¹¶å¨2ä¸ªæ¶éå¨æåè¾åºå¼ï¼è¿è¡æµæ°´çº¿ä¼ååï¼æ¯ä¸ªæ¶éåé½ä¼æ§è¡ä¸æ¬¡è¯»æä½ï¼å¤ç»æ°æ®æç§å¹¶è¡æ¹å¼æ§è¡ãè¿è¡æµæ°´çº¿ä¼åååçå»¶æ¶æåµå¦éå¾3æç¤ºï¼æªè¿è¡æµæ°´çº¿ä¼ååï¼ä¸¤ä¸ªè¯»æä½é´éè¦3ä¸ªæ¶éå¨æï¼ç»è¿8ä¸ªæ¶éå¨ææä¼æ§è¡å°æåä¸æ¬¡åæä½ï¼è¿è¡æµæ°´çº¿ä¼ååï¼ä¸¤ä¸ªè¯»æä½é´éè¦1ä¸ªæ¶éå¨æï¼ç»è¿4ä¸ªæ¶éå¨æå°±ä¼æ§è¡å°æåä¸æ¬¡åæä½ï¼å¯è§ç®æ³çæµæ°´çº¿ä¼åæé«äºæ°æ®ååéï¼å¤§å¹éä½å»¶æ¶ï¼æé«å¾åæ¶æçå®æ¶æ§ãSpecifically, pipelining allows operations to be executed in parallel, without each step of execution having to wait for all operations to complete before starting the next one. Pipelining is applicable to functions and loops. Taking loop pipeline optimization as an example, the variables in each round of loop involve three operations of reading, computing and writing. Before pipeline optimization, these three operations are executed in serial order, every 3 The input is read once per clock cycle, and the value is output after 2 clock cycles; after pipeline optimization, a read operation is performed within each clock, and multiple sets of data are executed in parallel. The delay before and after pipeline optimization is shown in Figure 3. Before pipeline optimization, three clock cycles are required between two read operations, and the last write operation will be executed after 8 clock cycles; after pipeline optimization , It takes 1 clock cycle between two read operations, and the last write operation will be executed after 4 clock cycles. It can be seen that the pipeline optimization of the algorithm improves the data throughput, greatly reduces the delay, and improves the real-time performance of image derotation.

(3)å¤AXIé«å¸¦å®½æ»çº¿å®æ¶å¹¶è¡ä¼åãç±äºæ¬åæè¦è§£å³çé®é¢æ¯å¯¹é«åè¾¨å¾åå®ç°å®æ¶æ¶æå¤çï¼èFPGAè¯çççåç¼å(BRAM)ç©ºé´æéï¼ä¸è¶³ä»¥ç¼åæ´å¸§é«åè¾¨çå¾åï¼å æ¤æ¬åæå¨ARMåµå¥å¼ç«¯å¤æ64ä½128MBçDDRè¯çï¼ç¨äºå¾åç¼åãä¸åäºç´æ¥ç¼åå¨BRAMä¸ï¼ç±äºDDRå¤æå¨ARMç«¯ï¼å æ¤FPGAè¯çéè¦éè¿AXIæ»çº¿ä»FPGAç«¯åARMç«¯çDDRè¿è¡æ°æ®è¯»åãéè¿åæåå¯¹å»¶æ¶çå®éæµéå¯å¾ï¼ç±äº(1)ä¸å·²ç»å¯¹ç®æ³è¿è¡äºæµæ°´çº¿ä¼åï¼æ¶æç®æ³æ¬èº«çå»¶æ¶å·²ç»è¢«éè³è¾ä½æ°´å¹³ï¼å æ¤å»¶æ¶ä¸»è¦æ¥æºäºéè¿AXIæ»çº¿ä»DDRè¯»åæ°æ®ãæ¬åææä½¿ç¨çFPGA+ARMå¤çæ¶æè¯çä¸ºZynq UltraScale+MPSOC 15EGï¼å¶å·æååä¸°å¯çAXIæ»çº¿èµæº(7æ¡128ä½AXIæ»çº¿)ï¼å æ¤æ¬åæä½¿ç¨å¤AXIé«å¸¦å®½æ»çº¿å¹¶è¡å¤ççæ¹å¼åæ¶è¯»åå¹¶å¤çå¤ä¸ªåç´ ç¹ï¼å¤§å¹éä½å»¶æ¶ï¼å¢å æ°æ®ååéï¼æé«ç®æ³å®æ¶æ§ãæç»æ¬åæä½¿ç¨2æ¡128ä½æ»çº¿ã1æ¡64ä½æ»çº¿è¿è¡å¤æ»çº¿å¹¶è¡å¤çï¼éå¯¹1080pç°åº¦å¾åï¼å¨360Â°èå´åæ§è¡åçº¿æ§æå¼æ¶æç®æ³æ´ä½å»¶æ¶ä¸º12msï¼æ è®ºæ¯éå¯¹30fpsçè§é¢å¾åè¿æ¯60fpsçè§é¢å¾åï¼åå¯å¨ä¸å¸§æ¶é´åå®ææ¶ææä½ï¼å³å®ç°äºé«åè¾¨çå¾åçå®æ¶æ¶æå¤çãåæ¶å¯ä»¥çå°ï¼æ¬åæåªå ç¨äº36ï¼çæ»çº¿èµæºå³å®ç°äº1080på¾åçæ¶æï¼å æ¤ç»§ç»å¢å æ»çº¿çä½¿ç¨å¯ä»¥è¿ä¸æ¥æåå¾åå®æ¶æ¶æçåè¾¨çã(3) Real-time parallel optimization of multiple AXI high-bandwidth buses. Since the problem to be solved by the present invention is to realize real-time derotation processing of high-resolution images, and the on-chip buffer (BRAM) space of FPGA chip is limited, it is not enough to buffer the whole frame of high-resolution images, so the present invention uses ARM embedded terminal plug-in 64-bit 128MB DDR chip for image cache. Unlike directly caching in BRAM, since the DDR is plugged into the ARM side, the FPGA chip needs to read and write data from the FPGA side to the DDR on the ARM side through the AXI bus. Through the analysis and the actual measurement of the delay, it can be obtained that the delay of the derotation algorithm itself has been reduced to a low level due to the pipeline optimization of the algorithm in (1), so the delay mainly comes from the AXI bus. DDR reads and writes data. The FPGA+ARM processing architecture chip used in the present invention is Zynq UltraScale+MPSOC 15EG, which has very rich AXI bus resources (7 128-bit AXI buses), so the present invention uses multiple AXI high-bandwidth bus parallel processing methods to simultaneously read Write and process multiple pixels, greatly reducing latency, increasing data throughput, and improving the real-time performance of algorithms. Finally, the present invention uses two 128-bit buses and one 64-bit bus for multi-bus parallel processing. For 1080p grayscale images, the overall delay of bilinear interpolation and derotation algorithm is 12ms in the range of 360Â°, whether it is for 30fps The video image is still a 60fps video image, and the derotation operation can be completed within one frame time, that is, the real-time derotation processing of the high-resolution image is realized. At the same time, it can be seen that the present invention realizes the derotation of 1080p images by only occupying 36% of the bus resources, so increasing the usage of the bus can further improve the real-time derotation resolution of the images.

(4)ä½¿ç¨é«å±æ¬¡ç»¼åææ¯å®ç°ç®æ³è®¾è®¡ä¼ååèµæºè°åº¦ãæ¬åææä½¿ç¨çZynqUltraScale+MPSOC 15EGå¤çè¯çä¸ºXilinxå¬å¸å¼åçå¼æåµå¥å¼è¯çï¼ä½¿ç¨Vivadoå¼åå¥ä»¶è¿è¡å¼åï¼å¶ä¸åå«é«å±æ¬¡å¼åå·¥å·Vivado HLSï¼å¨HLSå¼åæ¡æ¶ä¸ï¼å¯ä»¥ä½¿ç¨é«å±è¯è¨(C/C++/System C)æç§ç¹å®çè§èè¿è¡ç®æ³å¼åä¸ä¼åè®¾è®¡ï¼å¹¶æç»ç±HLSå·¥å·å°é«å±è¯è¨ç¨åºè½¬åä¸ºç¡¬ä»¶æè¿°è¯è¨(Verilog HDL/VHDL)ç¨åºãä½¿ç¨é«å±ç»¼åå·¥å·è¿è¡å¼åå¯ä»¥æ¹ä¾¿çè¿è¡ç®æ³è®¾è®¡ä¼åä¸é»è¾èµæºçå¨æè°åº¦ï¼å¤§å¹æåå¼åæçï¼åååæ¥äºFPGA+ARMæ¶æçå¤AXIæ»çº¿å¹¶è¡è®¡ç®ä¼å¿åå¤æµæ°´çº¿å éç¹æ§ï¼æ¾èæé«æ¶æç®æ³æ§è½ãæ¬åæä»é»è¾èµæºå ç¨ãå»¶è¿ãååéçæ¹é¢è¿è¡è®¾è®¡æè¡¡ï¼ç±äºæ¬åææä½¿ç¨è¯çç¡¬ä»¶é»è¾èµæºè¾ä¸ºä¸°å¯ï¼å æ¤å³å®çºç²é»è¾èµæºå ç¨æ¥å®ç°æ´ä½çç®æ³å»¶è¿åæ´é«çæ°æ®ååéãæ¬åæååå©ç¨äºHLSçåè¿°ä¼ç¹ï¼ä»æ°æ®ç±»åä¼ååæ°æ®ååéä¼åä¸¤ä¸ªæ¹é¢æ¥æé«æ¶æç®æ³æ§è½ãå·ä½èè¨ï¼æ°æ®ç±»åä¼åæ¹é¢ï¼æ¬åæä¸å¤æ¬¡ä½¿ç¨20bitä½å®½æ°æ®ï¼ç¶èæ åCçæ°æ®ç±»åä½å®½é½æ¯8bitçæ´æ°åï¼èè¥ç´æ¥ä½¿ç¨32bitä½å®½çæ´åæ°æ®åä¼é æé»è¾èµæºçæµªè´¹ï¼æ æ³åæ¥åºFPGAé«æ§è½åå¼ºå¤§å¹¶è¡è½åçä¼å¿ï¼å æ¤æ¬åæå©ç¨HLSå·¥å·æä¾çä»»æä½å®½æ°æ®å®ä¹çæ¹å¼å®ä¹äºä¸ä¸ª20bitä½å®½æ°æ®ï¼æå¤§èçº¦äºé»è¾èµæºçä½¿ç¨ãæ°æ®ååéä¼åæ¹æ³ï¼æ¬åææç§âä»¥é¢ç§¯æ¢éåº¦âçæè·¯ï¼å¯¹å¾ªç¯è¿è¡æµæ°´çº¿ä¼ååå¾ªç¯å±å¼ä¼åï¼ä»¥çºç²é»è¾èµæºå ç¨ä¸ºä»£ä»·æåç®æ³ååéï¼æé«ç®æ³æ§è½ã(4) Use high-level synthesis technology to realize algorithm design optimization and resource scheduling. The ZynqUltraScale+MPSOC 15EG processing chip used in the present invention is a heterogeneous embedded chip developed by Xilinx Company, which is developed using the Vivado development kit, which includes the high-level development tool Vivado HLS. Under the HLS development framework, high-level language (C /C++/System C) algorithm development and optimization design according to specific specifications, and finally the high-level language program is converted into a hardware description language (Verilog HDL/VHDL) program by the HLS tool. Using high-level synthesis tools for development can facilitate algorithm design optimization and dynamic scheduling of logic resources, greatly improving development efficiency, giving full play to the advantages of multi-AXI bus parallel computing and multi-pipeline acceleration characteristics of the FPGA+ARM architecture, and significantly improving the derotation algorithm. performance. The present invention makes design trade-offs in terms of logic resource occupation, delay, and throughput. Since the chip hardware logic resources used in the present invention are relatively rich, it is decided to sacrifice logic resource occupation to achieve lower algorithm delay and higher data throughput. The invention makes full use of the aforementioned advantages of HLS, and improves the performance of the derotation algorithm from two aspects of data type optimization and data throughput optimization. Specifically, in terms of data type optimization, the present invention uses 20bit bit width data multiple times, but the data type bit width of standard C is an integer multiple of 8bit, and if the integer data of 32bit bit width is used directly, it will cause logic The waste of resources cannot give full play to the advantages of FPGA high performance and powerful parallel capability. Therefore, the present invention defines a 20-bit bit-width data by using the arbitrary bit-width data definition provided by the HLS tool, which greatly saves the use of logic resources. Data throughput optimization method, according to the idea of "trading area for speed", the present invention performs pipeline optimization and loop unrolling optimization on loops, improves algorithm throughput at the cost of sacrificing logic resource occupation, and improves algorithm performance.

(5)ç»å®éæµè¯ï¼éå¯¹1920Ã1080å¯è§åå¾åå¯ä»¥å®ç°å®æ¶æ¶æå¤çï¼æ¶æèå´ä¸º0-360Â°ï¼å»¶æ¶å°äº12msï¼æ¶æè§ç²¾åº¦å¯è¾¾0.001Â°ï¼æå¤§åç´ è¯¯å·®å°äº1ä¸ªåç´ ï¼ç³»ç»æ´ä½å·æè§é¢åè¾¨çé«ãæ¶æèå´å¤§ãæ¶æç²¾åº¦é«ãå¤çåå¾åæ¸æ°æ é¯é½¿ãè¾åºå»¶è¿ä½ãç³»ç»ç¨³å®æ§å¼ºãå å·¥å®¹æãåèä½ãä½ç§¯å°çä¼è¯ç¹æ§ã(5) After actual testing, real-time derotation processing can be realized for 1920Ã1080 visible light images, the derotation range is 0-360Â°, the delay is less than 12ms, the precision of derotation angle can reach 0.001Â°, and the maximum pixel error is less than 1 pixel , The system as a whole has excellent characteristics such as high video resolution, large derotation range, high derotation precision, clear and jagged images after processing, low output delay, strong system stability, easy processing, low power consumption, and small size.

éå¾è¯´æDescription of drawings

å¾1ä¸ºåºäºå¤§è§æ¨¡éæçµè·¯é«å±æ¬¡ç»¼åçå¨ææ ææ¶æç³»ç»åçæ¡æ¶å¾ï¼Figure 1 is a schematic diagram of a dynamic stepless derotation system based on high-level synthesis of large-scale integrated circuits;

å¾2ä¸ºåºäºåçº¿æ§æå¼æ³çå¾åæ¶æç®æ³åçç¤ºæå¾ï¼Fig. 2 is the schematic diagram of the principle of the image derotation algorithm based on the bilinear interpolation method;

å¾3ä¸ºæµæ°´çº¿ä¼åå»¶æ¶ææå¾ï¼Figure 3 is an effect diagram of pipeline optimization delay;

å¾4ä¸ºå¨ææ ææ¶æå¤çæ¨¡åæµç¨å¾ï¼Fig. 4 is a flow chart of the dynamic stepless derotation processing module;

å¾5ä¸ºå¨ææ ææ¶æç³»ç»æææ¼ç¤ºï¼(a)ä¸ºæ¶æå¤çåï¼(b)ä¸ºæ¶æå¤çåãFigure 5 is a demonstration of the effect of the dynamic stepless derotation system, (a) before derotation treatment, (b) after derotation treatment.

å·ä½å®æ½æ¹å¼detailed description

ä¸é¢ç»åéå¾å¯¹æ¬åæçå·ä½å®æ½æ¹å¼åè¿ä¸æ¥è¯´æãThe specific embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

å¦å¾1æç¤ºï¼æ¬åæçæ¶æç³»ç»åæ¬è§é¢ééæ¨¡åãè§é¢è§£ç æ¨¡åãæ ¸å¿å¤çæ¨¡ååè§é¢ç¼ç æ¨¡åï¼æ ¸å¿å¤çæ¨¡åéç¨FPGA+ARMæ¶æçå¼æçä¸ç³»ç»ï¼FPGAåæ¬å¨ææ ææ¶ææ¨¡åãè§é¢è½¬AXIæ»çº¿è§é¢æµæ¨¡åãAXIè§é¢æµDDRè¯»åæ¨¡åä»¥åæ¬åæä¸ç¨äºéä½ç®æ³å»¶è¿ãæé«æ»çº¿å¸¦å®½å©ç¨çèåæ°è®¾è®¡çåç´ åå¹¶æ¨¡åå³ååä¸æ¨¡åï¼ARMåæ¬è§é¢åå¨æ¨¡åDDRåRS422ä¸²å£éä¿¡æ¨¡åï¼FPGAä¸ARMä¹é´çæ°æ®éä¿¡éç¨AXIæ§å¶æ»çº¿è¿è¡ãAs shown in Figure 1, the derotation system of the present invention includes a video acquisition module, a video decoding module, a core processing module and a video encoding module; the core processing module adopts a heterogeneous system-on-chip of FPGA+ARM architecture; FPGA includes a dynamic stepless derotation module , video to AXI bus video stream module, AXI video stream DDR read and write module, and the innovatively designed pixel merging module for reducing algorithm delay and improving bus bandwidth utilization in the present invention is a four-in-one module; ARM includes a video storage module DDR And RS422 serial communication module, the data communication between FPGA and ARM is carried out by AXI control bus.

è§é¢ééæ¨¡åä¸ºå·¥ä¸ç¸æºï¼åè¾¨çä¸º1920Ã1080ï¼å¸§é¢ä¸º30Hzæ60Hzï¼è§é¢è¾åºæ ¼å¼ä¸éãè§é¢è§£ç æ¨¡åä½¿ç¨è§é¢è§£ç è¯çï¼å¶ä½ç¨æ¯å°è¾å¥çä¸²è¡è§é¢ä¿¡å·è½¬åä¸ºå¹¶è¡æ ¼å¼è§é¢ä»¥åæ°æ®ææä¿¡å·DEãè¡åæ¥ä¿¡å·HSYNCãåºåæ¥ä¿¡å·VSYNCï¼æ°æ®ä¿¡å·åææä¿¡å·ãåæ¥ä¿¡å·ä¼ éè³FPGAè¿è¡åç»å¤çãè§é¢åå¨æ¨¡åéç¨4ç16ä½128MBçDDR4ç»åæä¸ç64ä½128MBçDDRï¼ç±äºæ¶æå¤çéè¦æ´å¸§å¾åç¼åï¼èFPGAåé¨ççåç¼åç©ºé´è¾å°ï¼ä¸è¶³ä»¥åå¨æ´å¸§å¾åï¼å æ¤éè¦å¤æåå¨å¨ï¼æ¬åææç»éæ©å°DDRå¤æå¨Zynqè¯ççARMç«¯ï¼è¿æ ·æ´æå©äºåç»æä½ãæ°æ®éä¿¡æ¨¡åä¸»è¦åæ¬ä¸¤é¨åï¼å¶ä¸æ¯æ¬åææè®¾è®¡ççµåæ¶æç³»ç»ä¸ä¸ä½æºä¸»æ§ä¹é´çéä¿¡ï¼è¿ä¸éä¿¡åºäºRS422è¿è¡è®¾è®¡ï¼è¿ç§ç¨³å®çä½éä¼ è¾åè®®å¯ä»¥æ»¡è¶³æ¬åæç³»ç»ä¸æ¶æè§åº¦çä¼ éï¼å¶äºæ¯Zynqè¯çåé¨FPGAç«¯åARMç«¯ä¹é´çéä¿¡ï¼è¿äºèä¹é´çéä¿¡éç¨Xilinxææä¾çAXIæ»çº¿éä¿¡åè®®ï¼éè¿AXIæ»çº¿è¿è¡æä»¤ä¿¡æ¯åå¾åä¿¡æ¯çä¼ éãè§é¢ç¼ç æ¨¡åä¸ºè§é¢ç¼ç è¯çï¼å¶ä½ç¨æ¯å°å¹¶è¡è§é¢æ°æ®åæ°æ®ææä¿¡å·DEãè¡åæ¥ä¿¡å·HSYNCãåºåæ¥ä¿¡å·VSYNCè½¬åä¸ºä¸²è¡è§é¢ä¿¡å·è¾åºï¼æåå°å¶è¾åºè³æ¾ç¤ºå¨æééå¡è¿è¡å®æ¶æ¾ç¤ºãè¯¥ç³»ç»çæ ¸å¿å¤çæ¨¡çåå·ä¸ºZynq UltraScale+MPSOC 15EGçARM+FPGAæ¶æçå¼æçä¸ç³»ç»ï¼Zynqæ¶æè¯çå¯ä»¥åååæ¥FPGAç«¯çå¹¶è¡å éåè½ä»¥åARMç«¯çä¸»æ§è°åº¦åè½ï¼æ¯ç®åå¼æçä¸ç³»ç»çä¸»æµè¯çä¹ä¸ãæ¬åæçæ ¸å¿ä¸ºååä¸æ¨¡ååå¨ææ ææ¶ææ¨¡åï¼å¨ææ ææ¶ææ¨¡åçç®æ³é¨ç½²å¨FPGAç«¯ï¼ååè°åº¦åä¸ä¸ä½æºçéä¿¡å¨ARMç«¯è¿è¡ãThe video acquisition module is an industrial camera with a resolution of 1920Ã1080, a frame frequency of 30Hz or 60Hz, and the video output format is not limited. The video decoding module uses a video decoding chip. Its function is to convert the input serial video signal into a parallel format video and data effective signal DE, line synchronization signal HSYNC, field synchronization signal VSYNC, data signal, effective signal, and synchronization signal to FPGA Follow up. The video storage module uses 4 pieces of 16-bit 128MB DDR4 to combine into a piece of 64-bit 128MB DDR. Since the derotation process requires the entire frame image cache, and the on-chip cache space inside the FPGA is small, it is not enough to store the entire frame image, so it needs As for the external memory, the present invention finally chooses to externally install the DDR on the ARM side of the Zynq chip, which is more conducive to subsequent operations. The data communication module mainly includes two parts, one of which is the communication between the electronic derotation system designed by the present invention and the main control of the upper computer. This communication is designed based on RS422. This stable low-speed transmission protocol can meet the requirements of the system of the present invention. The second is the communication between the FPGA end and the ARM end inside the Zynq chip. The communication between the two adopts the AXI bus communication protocol provided by Xilinx, and the instruction information and image information are exchanged through the AXI bus. transfer. The video encoding module is a video encoding chip, its function is to convert the parallel video data and effective data signal DE, line synchronization signal HSYNC, and field synchronization signal VSYNC into serial video signal output, and finally output it to the monitor or acquisition card for real-time display . The model of the core processing module of the system is Zynq UltraScale+MPSOC 15EG ARM+FPGA architecture heterogeneous system-on-chip. The Zynq architecture chip can give full play to the parallel acceleration function of the FPGA side and the master control scheduling function of the ARM side. It is the current heterogeneous system-on-chip One of the mainstream chips. The core of the invention is the four-in-one module and the dynamic stepless derotation module, the algorithm of the dynamic stepless derotation module is deployed on the FPGA side, and the memory scheduling and communication with the upper computer are carried out on the ARM side.

æ¬åæå·ä½åå«ä»¥ä¸æ¥éª¤ï¼The present invention specifically comprises the following steps:

æ¥éª¤ä¸ï¼è§é¢ééåè§£ç Step 1: Video capture and decoding

æ¬åæéç¨å·¥ä¸ç¸æºééè§é¢å¾åï¼å¹¶éè¿è§£ç è¯çè¿è¡è§é¢è§£ç å¾å°å¹¶è¡è§é¢ä»¥åæ°æ®ææä¿¡å·DEãè¡åæ¥ä¿¡å·HSYNCãåºåæ¥ä¿¡å·VSYNCãæ¬åææ¯åºäºFPGA AXIæ°æ®æµè¿è¡è®¾è®¡ï¼å æ¤éè¦å°è§£ç å¾å°çç¸å³ä¿¡å·éå¥è§é¢è½¬AXIæ»çº¿è§é¢æµæ¨¡åï¼å°å¹¶è¡è§é¢æ°æ®è½¬åä¸ºAXIæ»çº¿è§é¢æµæ°æ®ï¼ä¾¿äºåæé«æçå®ç°æµæ°´çº¿å éä¼åãThe invention adopts an industrial camera to collect video images, and performs video decoding through a decoding chip to obtain parallel video, effective data signal DE, line synchronization signal HSYNC, and field synchronization signal VSYNC. The present invention is designed based on the FPGA AXI data stream, so it is necessary to send the decoded related signals into the video-to-AXI bus video stream module, and convert the parallel video data into AXI bus video stream data, so as to facilitate the efficient realization of pipeline acceleration and optimization in the later stage.

æ¥éª¤äºï¼ç´§é»åç´ åå¹¶Step 2: Binning adjacent pixels

æ¬åæåæ°çè®¾è®¡äºååä¸æ¨¡åï¼æ°æ®æµæ¯æµå¥ä¸¤è¡å°±å°å¶ç¼åå¨çåé«éç¼åä¸ï¼å¹¶å°æ¯ä¸ªåç´ å¨å´ç´§é»çåä¸ª8ä½åç´ ç¹åå¹¶æä¸ä¸ª32ä½æ°æ®ï¼ä¹åå¯¹æåç´ ç¹è¿è¡åçº¿æ§æå¼çæ¶ææ¶ï¼å¯å°è¿32ä½æ°æ®ååºå¹¶åå²æåä¸ª8ä½çåç´ ç¹å³ä¸ºå¶åçº¿æ§æå¼æéç¨å°çåä¸ªåç´ ç¹ï¼å³å¯å®ç°ä¸æ¬¡è¯»ååä¸ªåç´ ç¹çåè½ï¼ç±æ¤å¯å°ç®æ³å»¶æ¶éè³åæ¥çååä¹ä¸ãThe present invention innovatively designs a four-in-one module. Every time the data stream flows into two lines, it is cached in the on-chip cache, and four 8-bit pixel points adjacent to each pixel are combined into a 32-bit data, and then When performing bilinear interpolation derotation on a certain pixel point, the 32-bit data can be taken out and divided into four 8-bit pixel points, which are the four pixel points required for its bilinear interpolation, that is, Realize the function of reading four pixels at a time, thereby reducing the algorithm delay to a quarter of the original.

æ¥éª¤ä¸ï¼è§é¢æ°æ®åå¨Step 3: Video data storage

å°æ¥éª¤äºåå¹¶åç32ä½è§é¢æµæ°æ®éè¿AXIè§é¢æµDDRè¯»åæ¨¡åç¼åè¿ARMçDDRä¸ï¼The 32-bit video stream data merged in step 2 is cached in the DDR of ARM through the AXI video stream DDR read-write module;

æ¥éª¤åï¼è§é¢æ°æ®å®æ¶å¨ææ ææ¶æå¤çStep 4: Real-time dynamic stepless derotation processing of video data

å¨ææ ææ¶æå¤çæ¨¡åæµç¨å¾å¦å¾4æç¤ºãæ¬åæä½¿ç¨Vivadoé«å±ç»¼åææ¯è®¾è®¡å¨ææ ææ¶æç®æ³ï¼å¹¶å°å¶å°è£æIPæ ¸ï¼æ¬IPæ ¸å®ä¹äºä¸¤ä¸ªm_axi(AXIä¸»æº)ç«¯å£ï¼åå«ç¨äºè¯»ãåDDR4çæä½ï¼è¯»m_axiç«¯å£ç¨äºç»AXIæ»çº¿ä»DDR4çå¸§ç¼ååºè¯»ååå§åç´ ä¿¡æ¯ï¼ç»è¿æ¶æç®æ³è¿è¡å¨ææ ææ¶æå¤çåï¼å©ç¨åm_axiç«¯å£è¾åºå°DDRçå¦ä¸ä¸ªå¸§ç¼ååºä¸ï¼ç±æ¤å®æå¾åæ¶æçå¨é¨æµç¨ãThe flowchart of the dynamic stepless derotation processing module is shown in Figure 4. The present invention uses the Vivado high-level synthesis technology to design the dynamic stepless derotation algorithm, and encapsulates it into an IP core. This IP core defines two m_axi (AXI host) ports, which are respectively used for reading and writing DDR4 operations, and for reading the m_axi port. The original pixel information is read from the frame buffer area of DDR4 via the AXI bus, and after the dynamic non-polar derotation processing is performed by the derotation algorithm, it is output to another frame buffer area of DDR by writing the m_axi port, thereby completing the image derotation whole process.

æ¥éª¤äºï¼è§é¢ç¼ç åè¾åºæ¾ç¤ºStep 5: Video encoding and output display

ç»è¿æ¥éª¤åçæ¶æå¤çåï¼æ¶æåçå¾åå·²ç¼åå¨DDRçä¸çç¼ååºåä¸ï¼åæ¬¡å©ç¨AXIè§é¢æµDDRè¯»åæ¨¡åä»DDRä¸å°ç¼åçæ¶æåçè§é¢å¾åè¯»åºå°AXIè§é¢æµä¸ï¼å¹¶å©ç¨AXIæ»çº¿è§é¢æµè½¬è§é¢æ¨¡åå°AXIè§é¢æµè½¬åä¸ºå¸¦ææ¾æ§åæ¥ä¿¡å·çå¹¶è¡è§é¢æ°æ®ï¼å¹¶å°å¶éå¥è§é¢ç¼ç è¯çä¸è¿è¡ç¼ç è¾åºè³çè§å¨æééå¡è¿è¡æ¶æåç»æçå®æ¶æ¾ç¤ºãAfter the derotation processing in step 4, the derotated image has been cached in a buffer area of the DDR, and the AXI video stream DDR read-write module is used again to read the cached derotated video image from the DDR to the AXI video stream, and use the AXI bus video stream to video module to convert the AXI video stream into parallel video data with explicit synchronization signals, and send it to the video encoding chip for encoding and output to the monitor or capture card for derotation Real-time display of results.

æ ¹æ®ä¸è¿°æ¥éª¤ï¼ä¸ä½æºç»å®ä»»ææ¶æè§åº¦ï¼æ¬åæç³»ç»å³å¯å®æ¶è¾åºæ¶æç»æãä¾å¦ä¸ä½æºä¸åæ¶æè§åº¦ä¸ºé¡ºæ¶éæè½¬0.625Â°ï¼ç»æ¶æç³»ç»å¤çååçå¾åå¦å¾5æç¤ºãå¾5ä¸ç(a)ä¸ºæ¶æå¤çåçåå§å¾åï¼å¯è§è¯¥å¾åå¨æ°´å¹³æ¹åä¸åå¨å¾æï¼å³åè½´æªåç¡®éå¹³ï¼åå¨éæ¶éæ¹åçæè½¬è§åº¦ï¼ç»ä¸ä½æºæµå®è¯¥æè½¬è§åº¦ä¸º0.625Â°ï¼å æ¤ä¸ä½æºåæ¬æ¶æç³»ç»ä¸å0.625Â°çæ¶æè§åº¦ï¼ç»ç±æ¬æ¶æç³»ç»è¿è¡è§é¢å¾åæ¶æå¤çåçå¾åå¦å¾5ä¸ç(b)æç¤ºï¼å¯è§æ¶æå¤çåçå¾åæ°´å¹³æ¹åå·²éå¹³ï¼ä¸æ¶æåçå¾åæ¸æ°æ é¯é½¿æåºï¼æ¶æè§ç²¾åº¦è¾¾å°äº0.001Â°ï¼è¯¥å¸§è§é¢å¾åçå¤çæ¶é´å°äº12msï¼å·æé«çå®æ¶æ§ãAccording to the above steps, the host computer can set any derotation angle, and the system of the present invention can output the derotation result in real time. For example, the derotation angle issued by the host computer is 0.625Â° clockwise, and the images before and after processing by the derotation system are shown in Figure 5. (a) in Figure 5 is the original image before derotation processing. It can be seen that the image is tilted in the horizontal direction, that is, the optical axis is not accurately balanced, and there is a counterclockwise rotation angle. The rotation angle measured by the host computer is 0.625 Â°, so the host computer issues a derotation angle of 0.625Â° to the derotation system, and the image after the derotation processing of the video image is shown in (b) in Figure 5 through the derotation system. It can be seen that the derotation processing The horizontal direction of the image has been trimmed, and the image after derotation is clear without jagged effect, and the precision of derotation angle reaches 0.001Â°. The processing time of this frame of video image is less than 12ms, which has high real-time performance.

æ¬åæè¯´æä¹¦ä¸æªåè¯¦ç»æè¿°çåå®¹å±äºæ¬é¢åä¸ä¸ææ¯äººåå¬ç¥çç°æææ¯ãThe contents not described in detail in the description of the present invention belong to the prior art known to those skilled in the art.

æä¾ä»¥ä¸å®æ½ä¾ä»ä»æ¯ä¸ºäºæè¿°æ¬åæçç®çï¼èå¹¶éè¦éå¶æ¬åæçèå´ãæ¬åæçèå´ç±æéæå©è¦æ±éå®ãä¸è±ç¦»æ¬åæçç²¾ç¥ååçèååºçåç§çåæ¿æ¢åä¿®æ¹ï¼ååºæ¶µçå¨æ¬åæçèå´ä¹åãThe above embodiments are provided only for the purpose of describing the present invention, not to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent replacements and modifications made without departing from the spirit and principle of the present invention shall fall within the scope of the present invention.

Claims (6)

1. A dynamic stepless despinning system based on large-scale integrated circuit high-level synthesis is characterized in that: the system comprises a video acquisition module, a video decoding module, a core processing module and a video coding module; the core processing module adopts a heterogeneous system on chip with an FPGA + ARM architecture; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR read-write module and a pixel merging module which is an all-in-one module and is used for reducing algorithm delay and improving the bus bandwidth utilization rate and is innovatively designed; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus;

the video acquisition module is used for acquiring an original video image by using a camera, wherein the video image is data to be despuned; the original video image after the acquisition enters a video decoding module;

the video decoding module is used for converting serial videos acquired by the camera into parallel video data and obtaining a series of dominant video synchronous signals, and the parallel video data and the synchronous signals obtained by decoding are sent to the FPGA;

in the FPGA, firstly, a video-to-AXI bus video stream module converts video data into AXI bus video stream data with lower delay and more beneficial to realizing data synchronization and pipeline acceleration optimization, then data in an AXI bus video stream format flows into a four-in-one module, the four-in-one module realizes that the data stream is cached in an on-chip cache every two lines of flowing in, four 8-bit pixel points around each pixel are merged into one 32-bit data, when four pixels adjacent to one pixel are required to be read subsequently, the merged 32-bit pixel is only required to be read once and is divided into four independent 8-bit data, namely, the function of reading the four pixel points at one time is realized, and the processing utilizes the AXI bus bandwidth to reduce the delay to one fourth of the original delay; caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;

the dynamic non-polar despinning module is used for dynamically performing non-polar despinning on video data in a video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by the upper computer through the RS422 serial port communication module, and is matched with the four-in-one module during despinning processing to divide 32-bit data read from the DDR into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR; and reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display.

2. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the four-in-one module and the dynamic non-polar despun module are developed by using a high-level comprehensive tool Vivado HLS, and are subjected to pipeline optimization by using a precompiled instruction pipeline, namely a pipeline optimization instruction, so that under the condition that the requirements of one-time input, one-time use and one-time output of data are met, namely that one data can be input only once and can be used only once, and finally, the data needing 8 clock cycles for processing can be processed only by using 4 clock cycles.

3. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the system also improves the performance of the racemization algorithm in the aspects of data type optimization, namely self-defined bit width data type and data throughput optimization; and performing real-time parallel optimization on the plurality of AXI high-bandwidth buses, and simultaneously reading and writing and processing a plurality of pixel points in a parallel computing mode.

4. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: in the dynamic non-polar despinning module, an image electronic despinning algorithm based on bilinear interpolation is adopted for real-time despinning, and the method specifically comprises the following steps:

(1) According to the despinning angle sent by the upper computer, the coordinate (x, y) of each pixel point (x ', y') of the video image after the despinning processing corresponding to the pixel point of the video image before the despinning processing is solved

Wherein Î¸ represents the racemic angle, x ₀ ï¼y ₀ Respectively representing the horizontal and vertical coordinates of the center of the image;

(2) Pixel mapping using bilinear interpolation

f(xï¼y)ï¼[f(1ï¼0)-f(0ï¼0)]x+[f(0ï¼1)-f(0ï¼0)]y+[f(1ï¼1)-f(1ï¼0)-f(0ï¼1)-f(0ï¼0)]xy+f(0ï¼0)

Wherein x and y are respectively integer coordinates obtained by rounding off the pixel coordinate points after racemization obtained in the step (1), f (0,0), f (1,0), f (0,1), f (1,1) are pixel gray values of 4 points around the (x, y), and f (x, y) is a pixel gray value obtained by bilinear interpolation at the coordinates of the (x, y);

(3) Determining the boundary of the despun image, wherein the size of the rotated image is generally changed compared with that before the rotation, so that the boundary of the video image needs to be determined again, and the determination of the four boundary positions of the video image, namely the upper boundary position, the lower boundary position, the left boundary position and the right boundary position, is calculated according to the following formula:

leftï¼max(x ₁ ï¼x ₂ ï¼x ₃ ï¼x ₄ )

rightï¼min(x ₁ ï¼x ₂ ï¼x ₃ ï¼x ₄ )

topï¼max(y ₁ ï¼y ₂ ï¼y ₃ ï¼y ₄ )

bottomï¼min(y ₁ ï¼y ₂ ï¼y ₃ ï¼y ₄ )

(4) And fixing the image resolution, cutting the despin video image by taking the center of the video image as the center, and fixing the output image resolution, namely keeping the same size of the output image.

5. The LSI high-level synthesis-based dynamic despinning system of claim 1, wherein: the heterogeneous system on chip with the FPGA and ARM architecture adopted by the core processing module is a Zynq UltraScale + MPSoC15EG chip.

6. A dynamic non-polar despinning method based on high-level synthesis of a large-scale integrated circuit is characterized by comprising the following implementation steps of:

(1) Converting serial video collected by a camera into parallel video data, obtaining a series of dominant video synchronous signals, and sending the parallel video data and the synchronous signals obtained by decoding to an FPGA;

(2) In the FPGA, video data is converted into AXI bus video stream data with lower delay and better benefit for realizing data synchronization and pipeline acceleration optimization through a video-to-AXI bus video stream module;

(3) Then the data in the AXI bus video stream format flows into a four-in-one module, as the despun processing of bilinear interpolation is carried out subsequently, each pixel is processed, the four pixels adjacent to each pixel are read from the DDR, the four-in-one module realizes that the data stream is cached in an on-chip cache every two lines of flowing in, four 8-bit pixel points around each pixel are merged into one 32-bit data, when the four pixels adjacent to a certain pixel are required to be read subsequently, only the merged 32-bit pixel needs to be read once and is divided into four independent 8-bit data, namely the function of reading the four pixel points once is realized, and the processing fully utilizes the AXI bus bandwidth to reduce the delay to one fourth of the original delay;

(4) Caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;

(5) Then the dynamic non-polar despinning module performs dynamic non-polar despinning on video data in the video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by an upper computer through the RS422 serial port communication module, the four-in-one module is matched during despinning processing, 32-bit data read from the DDR is divided into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR;

(6) Reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display;

in the steps (3) and (5), the four-in-one module and the dynamic non-polar despin module are developed by using a high-level comprehensive tool Vivado HLS, and a precompiled instruction pipeline is used for carrying out pipeline optimization on the algorithm, so that the programmed program meets the conditions that data is input, used and output once, namely, one data can be input once and used once, and finally, the data needs to be output and output once is subjected to pipeline processing, and the data which needs to be processed in 8 clock cycles originally is processed in 4 clock cycles; in addition, the performance of the despun algorithm is improved from the aspects of data type optimization and data throughput optimization; meanwhile, a plurality of AXI high-bandwidth buses are transferred to perform real-time parallel optimization, and a plurality of pixel points are read and written and processed simultaneously in a parallel computing mode.

CN202111223132.7A 2021-10-20 2021-10-20 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit Active CN113962842B (en) Priority Applications (1) Application Number Priority Date Filing Date Title CN202111223132.7A CN113962842B (en) 2021-10-20 2021-10-20 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit Applications Claiming Priority (1) Application Number Priority Date Filing Date Title CN202111223132.7A CN113962842B (en) 2021-10-20 2021-10-20 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit Publications (2) Family ID=79465107 Family Applications (1) Application Number Title Priority Date Filing Date CN202111223132.7A Active CN113962842B (en) 2021-10-20 2021-10-20 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit Country Status (1) Citations (3) * Cited by examiner, â Cited by third party Publication number Priority date Publication date Assignee Title US5832119A (en) * 1993-11-18 1998-11-03 Digimarc Corporation Methods for controlling systems using control signals embedded in empirical data CN106342328B (en) * 2008-05-23 2012-07-25 ä¸å½èªç©ºå·¥ä¸éå¢å¬å¸æ´é³çµåè®¾å¤ç ç©¶æ Electronics racemization method for parallel processing based on TIDSP CN109658337A (en) * 2018-11-21 2019-04-19 ä¸å½èªç©ºå·¥ä¸éå¢å¬å¸æ´é³çµåè®¾å¤ç ç©¶æ A kind of FPGA implementation method of image real-time electronic racemization

2021
- 2021-10-20 CN CN202111223132.7A patent/CN113962842B/en active Active

Patent Citations (4) * Cited by examiner, â Cited by third party Publication number Priority date Publication date Assignee Title US5832119A (en) * 1993-11-18 1998-11-03 Digimarc Corporation Methods for controlling systems using control signals embedded in empirical data US5832119C1 (en) * 1993-11-18 2002-03-05 Digimarc Corp Methods for controlling systems using control signals embedded in empirical data CN106342328B (en) * 2008-05-23 2012-07-25 ä¸å½èªç©ºå·¥ä¸éå¢å¬å¸æ´é³çµåè®¾å¤ç ç©¶æ Electronics racemization method for parallel processing based on TIDSP CN109658337A (en) * 2018-11-21 2019-04-19 ä¸å½èªç©ºå·¥ä¸éå¢å¬å¸æ´é³çµåè®¾å¤ç ç©¶æ A kind of FPGA implementation method of image real-time electronic racemization Non-Patent Citations (1) * Cited by examiner, â Cited by third party Title å®æ¶å¾åççµåæ¶æç³»ç»;æ¾ç¥¥èç;ãåçµå·¥ç¨ã;20051030(ç¬¬10æ);å¨æ * Also Published As Similar Documents Publication Publication Date Title US10755383B2 (en) 2020-08-25 Multi-space rendering with configurable transformation parameters US10282805B2 (en) 2019-05-07 Image signal processor and devices including the same CN109658337B (en) 2023-03-24 FPGA implementation method for real-time electronic despinning of images CN101882302B (en) 2012-05-30 Motion blur image restoration system based on multi-core CN102694997A (en) 2012-09-26 Design of general data collection and transmission board based on FPGA and camera link protocol-based interface CN107750366A (en) 2018-03-02 Hardware accelerator for histogram of gradients CN108616717B (en) 2020-09-22 Real-time panoramic video splicing display device and method thereof CN109857702B (en) 2023-02-17 Laser radar data read-write control system and chip based on robot CN101567979A (en) 2009-10-28 Data acquisition system between infrared vidicon and computer based on USB2.0 CN103544470A (en) 2014-01-29 Double-color infrared isomerism parallel automatic air target identifier for movable platform CN103544471B (en) 2017-05-17 Moving-platform heterogeneous parallel automatic identifier for geostationary targets CN111275608B (en) 2023-03-14 Remote sensing image orthorectification parallel system based on FPGA CN109873998B (en) 2022-06-28 Infrared video enhancement system based on multi-level guide filtering CN205375584U (en) 2016-07-06 Independent image acquisition system in computer CN104535194A (en) 2015-04-22 Simulation device and method of infrared detector based on DMD CN113962842B (en) 2022-12-09 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit CN111770342B (en) 2023-09-05 Video stepless scaling method CN101567078A (en) 2009-10-28 Dual-bus visual processing chip architecture CN115002304A (en) 2022-09-02 Video image resolution self-adaptive conversion device Zheng et al. 2022 An rram-based neural radiance field processor CN205249413U (en) 2016-05-18 Image signal gathers compression and high definition analysis processes device in real time CN118427136A (en) 2024-08-02 Direct memory access device, operation method and data processing device CN107707820A (en) 2018-02-16 Aerial camera real-time electronic zooming system based on FPGA US10110927B2 (en) 2018-10-23 Video processing mode switching CN201449607U (en) 2010-05-05 Data Collector Based on USB2.0 Legal Events Date Code Title Description 2022-01-21 PB01 Publication 2022-01-21 PB01 Publication 2022-02-15 SE01 Entry into force of request for substantive examination 2022-02-15 SE01 Entry into force of request for substantive examination 2022-12-09 GR01 Patent grant 2022-12-09 GR01 Patent grant

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4