If the traineddata cache becomes corrupted, tesseract.js will still load it without throwing an error. Then, when the recognize function is called, it results in an uncatchable fatal error.
Steps to reproduce the behavior:
const { createWorker, OEM } = require('tesseract.js');
const Jimp = require('jimp');
(async () => {
const worker = createWorker({
langPath: __dirname,
logger: message => {
//console.log(message);
},
/*errorHandler: error => {
console.log('error from worker:', error);
}*/
});
try {
const img = await Jimp.read('https://tesseract.projectnaptha.com/img/eng_bw.png');
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng', OEM.LSTM_ONLY);
console.log('Recognizing text...');
const {data: { text } } = await worker.recognize(await img.getBufferAsync(Jimp.AUTO));
console.log(text);
} catch (error){
console.log('caught error:', error);
}
process.exit();
})();
This results in the following output:
> tess-test@1.0 start
> node index.js
Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Recognizing text...
AdaptedTemplates != nullptr:Error:Assert failed:in file /workspace/tesseract/src/classify/adaptmatch.cpp, line 196
undefined
undefined
C:\Users\Razz\Documents\Visual Studio Code Projects\Razzmatazzz\tesstest\node_modules\tesseract.js\src\createWorker.js:173
throw Error(data);
^
Error: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
at ChildProcess.<anonymous> (C:\Users\Razz\Documents\Visual Studio Code Projects\Razzmatazzz\tesstest\node_modules\tesseract.js\src\createWorker.js:173:15)
at ChildProcess.emit (node:events:390:28)
at emit (node:internal/child_process:917:12)
at processTicksAndRejections (node:internal/process/task_queues:84:21)
Note the absence of "caught error", indicating that the error is not being caught. The "Error opening data file" output occurs on the worker.initialize() call, but it does not result in an exception being thrown at that point.
If, however, the errorHandler function is enabled, this is what happens:
> tess-test@1.0 start
> node index.js
Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Recognizing text...
AdaptedTemplates != nullptr:Error:Assert failed:in file /workspace/tesseract/src/classify/adaptmatch.cpp, line 196
undefined
undefined
error from worker: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
caught error: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
The worker's errorHandler function doesn't receive an error when the initialize function is called, but it does when recognize is called. Also, interestingly, the error triggered by calling the recognize function now becomes catchable.
I would expect the worker.recognize function to throw a catchable error, regardless of whether the user has specified an errorHandler for the worker. I would also expect the worker.initialize function to either throw an error when it can't load the specified traineddata or at least send an error to the errorHandler. Neither is currently done.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4