M.-A. Lemburg writes: > > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > > > 8-bit encodings of Unicode if you want. This is meaningless: legacy encodings of national character sets such Shift-JIS, Big Five, GB2312, or TIS620 are not "encodings" of Unicode. TIS620 is a single-byte, 8-bit encoding: each character is represented by a single byte. The Japanese and Chinese encodings are multibyte, 8-bit, encodings. ISO-2022 is a multi-byte, 7-bit encoding for multiple character sets. Unicode has several possible encodings: UTF-8, UCS-2, UCS-4, UTF-16... You can view all of these as 8-bit encodings, if you like. Some are multibyte (such as UTF-8, where each character in Unicode is represented in 1 to 3 bytes) while others are fixed length, two or four bytes per character. > > Um, if you go: > > > > JIS -> Unicode -> JIS > > > > you don't get the same thing out that you put in (at least this is > > what I've been told by a lot of Japanese developers), and therefore > > it's not terribly popular because of the nature of the Japanese (and > > Chinese) langauge. This is simply not true any more. The ability to round trip between Unicode and legacy encodings is dependent on the software: being able to use code points in the PUA for this is acceptable and commonly done. The big advantage is in using Unicode as a pivot when transcoding between different CJK encodings. It is very difficult to map between, say, Shift JIS and GB2312, directly. However, Unicode provides a good go-between. It isn't a panacea: transcoding between legacy encodings like GB2312 and Big Five is still difficult: Unicode or not. > > My experience with Unicode is that a lot of Western people think it's > > the answer to every problem asked, while most asian language people > > disagree vehemently. This says the problem isn't solved yet, even if > > people wish to deny it. This is a shame: it is an indication that they don't understand the technology. Unicode is a tool: nothing more. -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4