Tesseract performs extremely poorly when text is at an angle. For example, below is a scan with ~5 degrees of rotation. The first image shows the text Tesseract recognized without applying preprocessing while the second image shows what Tesseract recognized after rotating.
The maintainers of the main Tesseract repo frequently suggest adding image preprocessing steps (including auto-rotation) to workflows to address this, however this option is not ideal for web users. Given we already include the Leptonica image processing library, we should be able to expose a rotation option without much effort. Auto-rotation would be ideal, but is likely significantly more difficult to implement.
Possibly related to #588, which requests high-level functions that expose processed (binarized) images.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4