04-02-2004, 10:27 PM
Amber,
You have hit upon a common problem with Unicode for Indian scripts.
Old non-unicode TrueType fonts had custom encoding. So each font had its own mapping. You couldn't view one file written in one Sanskrit font with another Sanskrit font because they will have different mapping for the glyphs and ASCII characters.
Unicode standard is supposed to remove such problems. As compared to ASCII (8 bit characters) standard unicode employs characters that are 16 bit in size. This means that instead of just 256 possible characters you can now have 65536 (or 64K) characters.
So each script of the world can be assigned a unique range of Unicode characters. So a true Unicode font can represent all the scripts of the world simultaneously. Devanagari has such a unique Unicode range of characters assigned to it.
But here comes a new problem. ISCII was developed in India as an Indian version of ASCII. And unicode standard followed ISCII standard. It says that characters be assigned to each syllabic phoneme rather than to a particular glyph. This was held necessary as Devanagari glyphs are not static. They don't follow each other without modification as in Roman script. They change shape and form when attached to others. A syllable that is pronounced the same way can have actual glyphs appear differently depending upon the context.
But if you asisigned a unique character (8 bit) to a syllabic phoneme then that is not going to change. Its display glyphs may change, but the characters themselves won't change. This is useful if you want to any machine processing of Indian language texts.
This was all fine and advanced but this has created a unique problem. You need an extra layer of software which looks at the Unicod characters and decides which glyphs to employ. A Unicode text file in Sanskrit is just a string of Unicode characters. It doesn't have the display information about glyphs or how they modify when combined with others. That task is left to actual display software, e.g. your word processor.
So if you open a unicode Sanskrit text in Wordpad, you will see the syllable "ki" with the "i" matra being applied after the consonant rather than before it. This is because Wordpad is simply replacing Unicode character with its default glyph. It is not doing the extra processing that is required.
Unicode has thousands of characters assigned for Chinese. I think it would have been easier for everyone if Devanagari and other Indian scripts were assigned larger ranges to also include all possible glyphs. So that no further processing is required to display a Sanskrit text.
But it is not so. ISCII drove Unicode character assignment for Indian scripts. Linguistically speaking there are good reasons to beleieve that this is a better choice. Although this causes the irritating experiences that you described.
You have hit upon a common problem with Unicode for Indian scripts.
Old non-unicode TrueType fonts had custom encoding. So each font had its own mapping. You couldn't view one file written in one Sanskrit font with another Sanskrit font because they will have different mapping for the glyphs and ASCII characters.
Unicode standard is supposed to remove such problems. As compared to ASCII (8 bit characters) standard unicode employs characters that are 16 bit in size. This means that instead of just 256 possible characters you can now have 65536 (or 64K) characters.
So each script of the world can be assigned a unique range of Unicode characters. So a true Unicode font can represent all the scripts of the world simultaneously. Devanagari has such a unique Unicode range of characters assigned to it.
But here comes a new problem. ISCII was developed in India as an Indian version of ASCII. And unicode standard followed ISCII standard. It says that characters be assigned to each syllabic phoneme rather than to a particular glyph. This was held necessary as Devanagari glyphs are not static. They don't follow each other without modification as in Roman script. They change shape and form when attached to others. A syllable that is pronounced the same way can have actual glyphs appear differently depending upon the context.
But if you asisigned a unique character (8 bit) to a syllabic phoneme then that is not going to change. Its display glyphs may change, but the characters themselves won't change. This is useful if you want to any machine processing of Indian language texts.
This was all fine and advanced but this has created a unique problem. You need an extra layer of software which looks at the Unicod characters and decides which glyphs to employ. A Unicode text file in Sanskrit is just a string of Unicode characters. It doesn't have the display information about glyphs or how they modify when combined with others. That task is left to actual display software, e.g. your word processor.
So if you open a unicode Sanskrit text in Wordpad, you will see the syllable "ki" with the "i" matra being applied after the consonant rather than before it. This is because Wordpad is simply replacing Unicode character with its default glyph. It is not doing the extra processing that is required.
Unicode has thousands of characters assigned for Chinese. I think it would have been easier for everyone if Devanagari and other Indian scripts were assigned larger ranges to also include all possible glyphs. So that no further processing is required to display a Sanskrit text.
But it is not so. ISCII drove Unicode character assignment for Indian scripts. Linguistically speaking there are good reasons to beleieve that this is a better choice. Although this causes the irritating experiences that you described.