Far East Local Double-Byte Codes | pclt.sites.yale.edu

The Chinese, Japanese, and Korean languages have thousands of ideograph characters. This produced problems long before computers. When the printing press was introduced, each country had to develop simplified character sets that could plausibly be represented in type. However, even simplified sets may not fit in any eight bit code.

As computers and networks were introduced, each Far Eastern country developed several computer character codes. Generally these codes start with the ASCII character set and then add the local characters as multi-byte sequences.

Character codes are the simplest problem in supporting Far East languages. Keyboard input is a bigger issue. Screen display is an issue. Sorting and indexing is a problem. Over the years each country developed a body of hardware, programs, databases, and utilities to handle the local national standard character sets. This large body of existing material will not be replaced by efforts to develop some additional international character set.

A Japanese page may be in the “Japanese EUC” or “Shift-JIS” code, plus a few specialized alphabets. Chinese has two “simplified” codes and a larger “traditional” set. Any real data from a Far Eastern country will come in one of these national standard codes.