Input and Output of National Character Set
(Hong Kong)
Prepared by : Samuel Tam, Systems Manager, ITSD
Table of Contenty
1. Introduction
The purpose of this document is to introduce the input and output processing of Chinese in Hong Kong, in particular the existing Chinese IT processing in the Hong Kong Government (hereinafter abbreviated as "Government")
Totally, there are four topics to be coverd in this document. The first one is on "Chinese Character". The pupular Chinese Character sets and internal code sets used in Hong Kong will be described under this topic. The usage of Chinese characer in Government IT applications will also be discussed.
The second topic is on "Input of Chinese Character". Under this topic, the input devices, input methods and screen fonts used in Government IT applications will be introduced.
The third topic is on "Output of Chinese Character" which will cover the printer and print fonts used in Government.
The last topic is on "User Created Character" where the current situation of user created characters in Government IT Applications will be descrived.
2. Chinese Character
2.1 Chinese Character Sets
The major Chinese character sets avalable in the industry are summarized as follows :-
a) CNS 11643
CNS (Chinese National Standard) is the standard Chinese character set adopted in Taiwan. CNS is not in itself an internal code set like Big-5, IBM 5550. It is instead a character set on which the development of internal code set such as Big-5, IBM 5550 based on. The basic set of CNS 11643 (about 13,000 characters) contains commonly used traditional Chinese characters.
b) GB Character Sets
GB refers to the character sets used in mainland Chine. The basic set (about 6,700 Chinese characters) contains simplified Chinese characters and is published in 1980 as the national standard GB2312. Addtional characters have also been announced in supplementary character sets in GB12345-90 (2349 characters), GB8565-89 (288 characters), etc.
c) ISO 10646
ISO 10646 is an international standard defining a coded character set including different scripts if the world. It includes alphabetic scripts, e.g. Latin, Greek, East Asian ideographic scripts, e.g. Chinese, Japanese and Korean.
In Hong Kong, most of the popular Chinese products available in the market (e.g. ETEN, Chinese Windows) adopt the basic set of CNS 11643 as their basic character sets. Hence, most of the organizations in Hong Kong, including private sectors and Government sectors, adopt the basic set of CNS 11643.
2.2 Chinese Internal Code Sets
In Government, the popular Chinese internal code sets used are Big-5, IBM DBCS/5550. Brief descriptions if these code sets are provided follows :-
a) Big-5
It is the internal code set developed by 5 major Chinese vendors in Taiwan. This is the popular internal code set used in PC and mid-rabge computer systems.
b) IBM DBCS/5550
It is the internal code set adopted in IBM machines on various platforms, in particular on the mainframe system.
[NOTE :-
Some other internal code sets available in the market such as EPRO, Wang code, are also used in Government.]
2.3 Usage of Chinese Characters
In Government, most of the Chinese applications use Chinese to record names such as person name, street namem company name, etc. Apart from names, Chinese address are also recorded in most Chinese applications.
3. Input of Chinese Characters
The following section describes the present situation on the input of Chinese characters in Government IT applications :-
3.1 Input Device
PCs and Chinese terminals are used as the input devices. For PCs, Chinese subsystems(e.g. ETEN, Chinese Windows, 0/1) are installed for processing of Chinese characters.
3.2 Input Method
The commonly used input method are summarised as follows :-
a) Changjei
This is one of the most popular input method used, particularly in the PC platform and Word Processing applications. This method is suitable for professional Chinese data entry personnel such as calligraphists.
b) Kan Yee
It is the simplified method of the Changjei input method with the design objective of easier input. Therefore, this input method is mostly used by casual Chinese users.
c) Others
Some other methods are also used, e.g. CCC (Chinese Commercial Code) input which is used in Immigration Department, Pin Yin, Direct Input, etc.
Regarding the above input methods, Changjei and Kan Yee are the most popular methods used in Hong Kong. They are also the standard input methods adopted in Government because these methods are supported by most Chinese subsystems (e.g.ETEN, KUO CHIAO, etc.), relatively easy to use and learn, and there is a large skill basee in Government.
It is also noticed that the use of "direct input" method (e.g. Notepen) becomes popular, due to the reason that special training is not required for using the input method.
3.3 Screen Font
The popular screen font is 24x24. However, 16x15 is also used in some applications. For new applications, the use of scalable font becomes popular.
4. Output of Chinese Characters
The following section describes the present situation on the output of Chinese characters in Government IT application :-
4.1 Printer
The printers used for printing Chinese data are mainly laser printer and dot matrix printer.
4.2 Print Font
The most popular print font is Ming while some applications may also use Sung and Black. The most popular font size is 24x24. Font sizes of 16x16, 36x36, 40x40 and 48x48 are also used. Similar to screen font, the use of scalable font becomes popular in new applications.
5. User Created Character
5.1 UDA (User Defined Area)
Most of the Chinese subsystems (e.g. ETEN, Chinese Windows) provide standard font library for the basic character ser of their supported internal code (e.g. Big-5, IBM DBCS/5550). The number of characters in the basic character set is usually about 13,000 characters.
Due to the enormous numbers of Chinese characters, the provided basic character set may not meet the needs of individual applications. Therefore, the Chinese subsystems usually provide UDA to enable users to create their own Chinese characters. For the popular internal code sets used in Government (e.g. Big-5, IBM 5550), the provided UDA contains about 5,800 characters.
5.2 Growth Rate
In Government applications, the size of user defined characters vary from one application to another. Some applications have about 3500 characters already defined. The annual growth rates of user defined characters also vary which range from 100to 400 characters.
5.3 Common UDA
It is a common phenomenon that the user defined characters for individual application systems are dufferent. To facilitate the sharing/exchange of Chinese data amongst Government applications, there is an intention to align the UDA for most of the applications.