According to npm statistics (and Git Issues), many users are still using Tesseract.js v2. Version 2 was released in 2019 and includes many bugs, memory leaks, and performance issues that have been fixed in subsequent versions (in some cases v2 is 20x slower than the current version), so updating is strongly recommended. Additionally, v2 is no longer supported, so updating is a requirement to receive support in Git Issues.
While the changes made in each release are fully documented, to make upgrading as easy as possible, below is a guide describing all changes that v2 users may need to make to use the latest version. This guide describes the process of upgrading from v2 to v5. If (for whatever reason) you wish to update from v2 to v4, see the comment below.
Changes Impacting Most UserscreateWorker
is now async
worker = Tesseract.createWorker()
should be replaced with worker = await Tesseract.createWorker()
createWorker
have changed--the first two arguments are now language and oem
createWorker('eng', 1, { logger: m => console.log(m) })
worker.load
, worker.loadLanguage
, and worker.initialize
are no longer needed
getPDF
function
pdf
recognize option ( GetPDF() with Scheduler returns the same PDF file #488)cacheMethod: 'none'
or cacheMethod: 'refresh'
as workaround for caching bug
corePath
argument
corePath
must be pointed to a directory containing all 4 of the following files from Tesseract.js-core v5:
tesseract-core.wasm.js
tesseract-core-simd.wasm.js
tesseract-core-lstm.wasm.js
tesseract-core-simd-lstm.wasm.js
worker.detect
function
legacyCore: true
and legacyLang: true
in createWorker
options
Tesseract.createWorker("eng", 1, {legacyCore: true, legacyLang: true})
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4