[M.-A. Lemburg] >BTW, does Java support UCS-4 ? If not, then Java is wrong >here ;-) Java claims to use unicode 2.1 [*]. I couldn't locate anything describing if this is UCS-2 or UTF-16. I think unicode 2.1 includes UCS-4. The actual level of support for UCS-4 is properly debatable. - The builtin char is 16bit wide and can obviously not support UCS-4. - The Character class can report if a character is a surrogate: >>> from java.lang import Character >>> Character.getType("\ud800") == Character.SURROGATE 1 - As reported, direct string comparison ignore surrogates. - The BreakIterator does not handle surrogates. It does handle combining characters and it seems a natural place to put support for surrogates. - The Collator class offers different levels of normalization before comparing string but does not seem to support surrogates. This class seems a natural place for javasoft to put support for surrogates during string comparison. These findings are gleaned from the sources of JDK1.3 [*] http://java.sun.com/docs/books/vmspec/2nd-edition/html/Concepts.doc.html#25310 regards, finn
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4