Here's a good summary of how XML's case sensitivity came to be. -------- Original Message -------- Subject: Re: Case sensitivity Date: Mon, 3 Apr 2000 12:44:37 -0400 From: Steve DeRose <Steven_DeRose@brown.edu> To: xml-dev@lists.oasis-open.org References: <B50E2EFA.1B57%soord@vda.nl> Languages with no need for case folding are not much of a problem: the case-folding table or program would merely have no effect on characters belonging to such languages: It would change 26 of our 26 alphabetic code points, and no others. After all, in English we already use lots of characters that don't get case-folded (like digits). The serious problems are subtler: The practical problem that with Unicode your folding table gets really big; on the order of 128Kbytes instead of 256 bytes (barring compresson): this is a pain on small devices (like a cell-phone browser), a pain to load, a pain to implement compression for, etc. The theoretical problem is more important: it's not the caseless languages that pose problems, but the complicated case-folding ones. For example, lots of languages only apply diacritical marks to lower-case letters: for example, "a" may come with 6 different accents, but "A" takes none -- which makes case-folding unreversible. If there are languages that operate the other way as well, then neither fold-to-upper nor fold-to-lower can work for all languages: either way would trash some languages. That said, I think it incumbent on XML *search engines* to support case-folding (as well as decent treatment of accents, types of hyphens, etc) for text content searches: Making English speakers search for "the" | "thE" | "tHe" | "tHE" | "The" | "ThE" | "THe" | "THE" or "[tT][hH][eE] is patently absurd; and besides, there is no user cost to (say) a Japanese speaker if an engine *does* case-fold. Also, many languages use Roman characters occasionally, as for acronyms; so their speakers also pay a price if searches aren't smart enough. And the primary problems with case-folding do not weigh so heavily in the search engine world (for example, AltaVista isn't likely to run their main servers on cell phones for a while yet).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4