A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/naptha/tesseract.js/issues/662 below:

Version 4 Development and Changes · Issue #662 · naptha/tesseract.js · GitHub

Overview

While bug fixes continue to be released for Version 3, all breaking changes will be released in Version 4, which is currently under development in the branch named dev/v4. This branch should be usable at present by users eager to use any new features, however there is no guarantee that additional breaking changes will not be implemented. Note that using this branch also requires using the Tesseract.js-core branch dev/v4.

Summary Breaking Changes
  1. createWorker is now async
    1. In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
    2. Calling with invalid workerPath or corePath now produces error/rejected promise ( Rework error reporting from worker threads so all promises resolve #654)
  2. worker.load is no longer needed (createWorker now returns worker pre-loaded)
  3. getPDF function replaced by pdf recognize option ( GetPDF() with Scheduler returns the same PDF file #488)
    1. This allows PDFs to be created when using a scheduler
    2. See browser and node examples for usage
Major New Features
  1. Processed images created by Tesseract can be retrieved using imageColor, imageGrey, and imageBinary options ( Is it possible to obtain the Thresholded Image from tesseract? #588)
    1. See image-processing.html example for usage
  2. Image rotation options rotateAuto and rotateRadians have been added, which significantly improve accuracy on certain documents
    1. See Issue Add rotation preprocessing option #648 example of how auto-rotation improves accuracy
    2. See image-processing.html example for usage of rotateAuto option
  3. Tesseract parameters (usually set using worker.setParameters) can now be set for single jobs using worker.recognize options ( Allow for setting parameters for single recognize job when using scheduler #665)
    1. For example, a single job can be set to recognize only numbers using worker.recognize(image, {tessedit_char_whitelist: "0123456789"})
    2. As these settings are reverted after the job, this allows for using different parameters for specific jobs when working with schedulers
  4. Initialization parameters (e.g. load_system_dawg, load_number_dawg, and load_punc_dawg) can now be set ( Add a way to set "Init Only" parameters (user_word_suffix, etc.) #613)
    1. The third argument to worker.initialize now accepts either (1) an object with key/value pairs or (2) a string containing contents to write to a config file
    2. For example, both of these lines set load_number_dawg to 0:
      1. worker.initialize('eng', "0", {load_number_dawg: "0"});
      2. worker.initialize('eng', "0", "load_number_dawg 0");
Other Changes
  1. loadLanguage now resolves without error when language is loaded but writing to cache fails
    1. This allows for running in Firefox incognito mode using default settings ( Tesseract fails when running in Firefox incognito browser #609)
  2. detect returns null values when OS detection fails rather than throwing error ( Failed to dectet OS #526)
  3. Memory leak causing crashes fixed ( worker.recognize memory leak #678)
  4. Cache corruption should now be much less common ( Fix asynchronous caching bug #666)
Detail New Output Format Interface

A single, unified interface has been added for specifying all output formats. output is now the 3rd argument to recognize (see example below). This replaces the separate getPDF function, as well as various setParameters options (tessjs_create_box, tessjs_create_hocr, tessjs_create_osd, tessjs_create_tsv, and tessjs_create_unlv).

const outputOpts = {
  text: true,
  blocks: true,
  hocr: true,
  tsv: true,
  box: false,
  unlv: false,
  osd: false,
  pdf: false,
  imageColor: false,
  imageGrey: false,
  imageBinary: false
};

const res = await worker.recognize(files[0], undefined, outputOpts);

Note: the default output formats (text, blocks, hocr, and tsv) are not changing between v3 and v4, so this change only impacts users who want non-default options. This also means that users who want text and pdf outputs only need to specify {pdf: true}, as text is already a default.

Zikoel, lmk123, berlyozzy and Josemagne


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4