Sri Lankan Students
Sri Lankan Students Home Page
Sri Lankan Students
Sri Lankan Students Forum
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
Sinhala Unicode encoding

 
Post new topic   Reply to topic    Sri Lankan Students Forum Index -> General Discussion
View previous topic :: View next topic  
Author Message
lankan
Intermediate
Intermediate


Joined: 26 Jan 2006
Posts: 121

PostPosted: Fri Mar 31, 2006 8:52 pm    Post subject: Sinhala Unicode encoding Reply with quote

There is a lot of confusion among people in understand how Unicode works. This has led to misunderstanding that some letters have been omitted from the Sinhala Unicode encoding and hence "destroyed" Sinhala language. So I will try my best to explain how it works, and on the way answer your questions.

Unicode is a universal 16-bit encoding (as opposed to 7-bit ASCII) method to represent all the scripts used in the world. Each script is assigned some range in this encoding. Most alphabetic scripts (such as Sinhala, Tamil, Devanagari, etc) get 128 (i.e. 2^7) positions in the Unicode table. e.g. Sinhala script has the range 0D80 to 0DFF (in Hex). This is good enough to fit the basic alphabet, but not all the Sinhala letters. So Unicode code chart contains 'code points' for the basic alphabet (20 vowels + 41 consonants) and one vowel modifier for each vowel (i.e. Pili).

e.g. To generate the letter Kuyanna, you have to have Kayanna (0D9A) followed by Papilla (0DD2)
Similarly to generate the letter Nuyanna, you have to have Nayanna (0DB1) followed by Papilla (0DD2).

You will notice that the shape of the Papilla used for Kuyanna and Nuyanna should be different but we used the same code point (0DD2).
Unicode does not care about how the letter should be displayed, rather it worries about the encoding of the letter. Displaying the correct shape of the code combinations (0D9A+0DD2 and 0DB1+0DD2) is upto the rendering engine which is part of the font you use. Rendering engine has a table which maps each code combination to a unique shape (called Glyph) which is what we see on the screen. In this case, rendering engine has two glyphs for Kuyanna and Nuyanna and display them accordingly when it see the corresponding code combinations.

If you look into the Unicode Sinhala code chart you will notice it has single vowel modifiers for O, OO, AI, AU, etc (0DDA to 0DDE). Usually we type two or more vowel modifiers to generate them.
e.g. Usually, to write the letter Koyanna, we write Kombuwa + Kayanna + Alapilla. But in Unicode, it should be Kayanna (0D9A) + Kombuwa_ha_Alapilla (0DDC).

Here again, Unicode does not care about how you enter them. This does not mean that you have to type it like that. You can have a keyboard driver which allows you to type them separately but the keyboard driver should convert them into the correct Unicode encoding internally.

Now about Yansaya, Rakaransaya and Repaya. It’s true that these symbols cannot be found in the Sinhala Unicode table. It’s simply because they are not seperate Sinhala letters, rather short-hand repetitions for X+Hal+Yayanna, X+Hal+Rayanna, Rayanna+Hal+X respectively where X is any sinhala consonant (or combination + vowel combination). Especially for Repaya, the contemporary usage is to use Rayanna+Hal. To facilitate both these forms, Unicode has a special code called ZWJ (Zero Width Joiner) which is common to all scripts.

Here is how they should be encoded:

e.g. for Repay: Karma
1) with Repaya: Kayanna+Rayanna+Hal+ZWJ+Mayanna
2) without Repaya: Kayanna+Rayanna+Hal+Mayanna

e.g. for Yansaya: Satya
1) with Yansaya: Sayanna+Tayanna+Hal+ZWJ+Yayanna
2) without Yansaya: Sayanna+Tayanna+ Hal+Yayanna

e.g. for Rakaransaya: Mitra
1) with Rakaransaya: Mayanna+Ispilla+Tayanna+Hal+ZWJ+Rayanna
2) without Rakaransaya: Mayanna+Ispilla+Tayanna+ Hal+Rayanna

Again, this does not mean you should type them like this. The keyboard can have separate keys for Repaya, Yansaya and Rakaransaya but when pressed, they should generate Hal+ZWJ+Mayanna/Yayanna/Rayanna/ combination internally.

Bandi akuru are handle the same way:

e.g. Paksha
1) with bandi: Payanna+Kayanna+Hal+ZWJ+Shayanna
2) without bandi: Payanna+Kayanna+Hal+Shayanna

The problem is, none of these is explained in the Unicode standard. So if you look only into the Sinhala code chart, you cannot stop from think some characters are missing (and "Unicode has destroyed Sinhala language" as some of the Sinhala experts says). Rather, it has simplified the encoding, and given us the freedom to define different ways to display letters and keyboards layouts.

If your computer (rather the rendering engine) does not know how to render these key combinations, especially ZWJ, you would see 2) for both 1) and 2). That's why you see Wijaagra incorrectly. So until OSs support this kind of rendering (most Linux versions already does, Windows Vista has it, I am not sure about Mac OS) you will have to use a patch like Sinhala enabling package for Windows XP.
Back to top
View user's profile Send private message
Sponsor
Display posts from previous:   
Post new topic   Reply to topic    Sri Lankan Students Forum Index -> General Discussion All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


© 2006 www.slstudents.org. All rights reserved.