A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2010-December/106344.html below:

[Python-Dev] Python and the Unicode Character Database

[Python-Dev] Python and the Unicode Character Databasehaiyang kang cornsea at gmail.com
Fri Dec 3 04:18:43 CET 2010
> Furthermore, data can well originate from texts that were written
> hundreds or even thousands of years ago, so there is plenty of
> material available for processing.

humm...,  for this, i think we need a special tuned language
processing system to handle this, and one subsystem for one language :)...
(sometimes a single word is not enough, we also need context)

Take pi for example, in modern math, it is wrote as: 3.1415...;
 in old China, it is sometimes wrote as: 三一四一五 or
 三点一四一五 or 叁点壹肆壹伍;

And if these texts are extracted through scanner
 (OCR or other image processing tech),  in my POV,
it is the job of this image processing subsystem
 (or some other subsystem between the image processing and database)
to do the mapping between number and raw text data, example table in DB:
text      | raw data                    |raw image data
-----------|---------------------------------|-----------------------
3.1415 | 三一四一五                | image...

br,
khy
More information about the Python-Dev mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4