While bug fixes continue to be released for Version 3, all breaking changes will be released in Version 4, which is currently under development in the branch named dev/v4. This branch should be usable at present by users eager to use any new features, however there is no guarantee that additional breaking changes will not be implemented. Note that using this branch also requires using the Tesseract.js-core branch dev/v4.
Summary Breaking ChangescreateWorker
is now async
worker = Tesseract.createWorker()
should be replaced with worker = await Tesseract.createWorker()
workerPath
or corePath
now produces error/rejected promise ( Rework error reporting from worker threads so all promises resolve #654)worker.load
is no longer needed (createWorker
now returns worker pre-loaded)getPDF
function replaced by pdf
recognize option ( GetPDF() with Scheduler returns the same PDF file #488)
imageColor
, imageGrey
, and imageBinary
options ( Is it possible to obtain the Thresholded Image from tesseract? #588)
rotateAuto
and rotateRadians
have been added, which significantly improve accuracy on certain documents
rotateAuto
optionworker.setParameters
) can now be set for single jobs using worker.recognize
options ( Allow for setting parameters for single recognize job when using scheduler #665)
worker.recognize(image, {tessedit_char_whitelist: "0123456789"})
load_system_dawg
, load_number_dawg
, and load_punc_dawg
) can now be set ( Add a way to set "Init Only" parameters (user_word_suffix, etc.) #613)
worker.initialize
now accepts either (1) an object with key/value pairs or (2) a string containing contents to write to a config fileload_number_dawg
to 0:
worker.initialize('eng', "0", {load_number_dawg: "0"});
worker.initialize('eng', "0", "load_number_dawg 0");
loadLanguage
now resolves without error when language is loaded but writing to cache fails
detect
returns null
values when OS detection fails rather than throwing error ( Failed to dectet OS #526)A single, unified interface has been added for specifying all output formats. output
is now the 3rd argument to recognize
(see example below). This replaces the separate getPDF
function, as well as various setParameters
options (tessjs_create_box
, tessjs_create_hocr
, tessjs_create_osd
, tessjs_create_tsv
, and tessjs_create_unlv
).
const outputOpts = {
text: true,
blocks: true,
hocr: true,
tsv: true,
box: false,
unlv: false,
osd: false,
pdf: false,
imageColor: false,
imageGrey: false,
imageBinary: false
};
const res = await worker.recognize(files[0], undefined, outputOpts);
Note: the default output formats (text
, blocks
, hocr
, and tsv
) are not changing between v3 and v4, so this change only impacts users who want non-default options. This also means that users who want text and pdf outputs only need to specify {pdf: true}
, as text is already a default.
Zikoel, lmk123, berlyozzy and Josemagne
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4