Development of the Jawi Character Set

Country Report

Muhammad MUN'IM bin Ahmad Zabidi
Lecturer, Faculty of Electrical Engineering, Universiti Teknologi Malaysia

This paper was delivered at the 1st AFSIT, September 14, 1987, Tokyo, Japan.


Abstract: The Jawi alphabet is used by the Malay people of Southeast Asian region, It is superset of the Arab alphabet. The paper introduces the Jawi alphabet and describes a preliminary Malaysian Standard for the Jawi character set.

1.0 Introduction

The Jawi alphabet is used by the Malay people of Malaysia, Singapore, Brunei, Southern Thailand and Indonesia. It is superset of the Arabic alphabet as contains all letters of the Arabic alphabet and seven additional letters.

The influence of Arabic on the Jawi alphabet follows the spread of Islam to the Southeast Asian region. Before the arrival of Islam, not many indigenous people of the region knew how to write, which explains the lack of historical records of the region. The probable exception was the people of Java, who embraced Hinduism and had an embraced Islam to become literate in order to understand the Holy Quran, which must be only in Arabic. The Malays soon adopted the Arabic letters to transcribe their spoken language and at the same time added seven letters to accommodate some sounds which do not exist in Arabic. Jawi became the name of the alphabet as it is the name of the region in Arabic. Today, Malay is largely written using the Latin alphabet. However, Jawi is still needed when accuracy is required, for example, in writing names and in Islamic studies.

2.0 The Arabic Alphabet

This section gives an overview of the Arabic alphabet for the purpose of introducing the Jawi alphabet. The Arabic alphabet consists of 29 letters and written right-to-left. The characters are shown in Figure 1. Some characters are derived from the basic letters shown above. Examples are:

Even though the Arabic alphabet has more letters compared to the Latin alphabet, some consonants or letters occurring in Latin cannot be transliterated directly into Arabic. These include ch sound as in chip, ng as in ring, g as in girl, p as in pretty, v as in vase and x as in box.

3.0 The Arabic Character Set

The ISO 9036 International Standard on Arabic 7-bit coded character set for information interchange standardizes on the Arabic letters (See Figure 3), In ISO 9036 a letter has only one bit combination (code) and the different forms a letter may take are not differentiated. This approach eliminates some ligatures from the standard but some superimposed character combinations are included such as (hamza on alef), and (hamza on yak). In addition, some bit combinations are used to represent diacritics which are placed above or below a character. The Arabic character set appears in ISO/IEC 10367 and ISO 8859 standardized graphic character sets for use in g- bit codes where the Arabic character becomes a supplement to the Latin character set within one international standard. The Arabic character set appears as part of the Unicode and 10646 16-bit character sets.

3.0 The Jawi Character Set

The basic Jawi alphabet has character in the basic Arabic alphabet. The difference between Jawi and Arabic are follows:

The Jawi character set is not perfectly compatible with the Arabic character set. This is due to conflicting requirements. It the Arabic character set is adopted with minimal modification to produce the Jawi character set, then the additional Jawi characters would be located at unused character positions. This would result in the best compatibility between the two sets but results in discontinuity with regards to collating sequence. If the characters are ordered according to their proper sequence, then no compatibility must be assumed. The standards making group in SIRIM still cannot make a firm decision on this mater until further inputs are received. Figure 2: Basic Jawi characters

4.0 Internationalization of the Jawi Character Set

The Jawi character set proposed by our group in Malaysia can be seen as the first step towards making the unique Jawi characters part of international standards. It should be of interest of other countries in Southeast Asia and to some extent to Arab countries. The Arab character set is the basis for not only Jawi but also for Urdu (used in India and Pakistan) and Persian character set. Whether Jawi, Urdu or Farsi, the modifications are mainly to cater for unique requirements of the respective regions. As mentioned, in Arabic there is no character which can produce the equivalent of the Latin ''P' In Arab countries, the letter (ba) serves a dual purpose of producing the 'b' and 'p' sounds. In Jawi, the letter is used solely for 'p' while in Farsi/Urdu the letter is used. In Jawi, the letter has a totally different sound. Farsi also has the letter which is not in Jawi. It would be a good idea if the three different systems are streamlined and standardized, within the allowable language rules. This may not possible in the very near future as even the Farsi character sets used for different software packages in Iran are incompatible. Even in computers which use only Arabic, the mappings used to represent the variations of each character are different (for example, those used in Windows and Macintosh). To make sure the standardization is complete, it may be necessary to create a standard which includes every possible character shape. This would easily make 8 bit s insufficient especially if all diacritics are involved. 5.0 Conclusion

In Southeast Asia, the ability of a computer system to manipulate multiple character sets simultaneously becomes important as some ideas can only be expressed in the original alphabet. For example, in Malaysia and Singapore there are ethnic groups, each of which is culturally distinct. Each culture uses a different writing system. A person's name can only be pronounced correctly when the proper character is used. In addition, because English is the common medium, the basic Latin character set must be maintained. To facilitate data interchange between these countries, the use of international standards which incorporates all possible characters in use therefore necessary.

Figure 3: ISO 9036 Arabic 7-bit character set

Figure 4: Preliminary Malaysian Standard Jawi character set