Wednesday, August 24, 2011

Lost in Transliteration


Transmogrifier is a device into which a person goes thinking of becoming of something and comes out thinking being exactly that. (Invented by Lord Calvin and Sir Hobbes).

Transliterator is a device, where a sound goes into it in one script and comes out the same in another script.

I never understood the spelling-bee contests. Young kids producing words out of infinitely bewildering phonetics. "trouvaille" (meaning: finding) goes one word, pronounced "truva" as many as 4 letters are silent. These spelling bees are only gonna get tougher, and in three years from now we may have the announcer keeping mum and the poor student will have to find the word, because, er.. all the letters are silent. Kudos to the effort and memory power of the students though.

The whole spelling bee concept is based on the platform of arbitrary and artificial phonetics. If there were a one-to-one correspondence between sound and script, the spelling-bee will become a drone-bee. If we rule that English be written phonetically from tomorrow, spelling assignments would disappear. What you say is what you hear is what your mind interprets. WYSIWYHIWYMI.

This reminds of artificially created problems and equally artifical solutions. A good analogy is some of the Java design patterns like Iterator (where you write elaborate classes) "disappear" in dynamic languages like Groovy. The Java design patterns have been elevated into language features of Groovy, so there is no "problem" to start with!


So how many times in life will you ever use that word again? That thought makes me nostalgic about those toefl and gre vocabularies where I crammed words to better the neighborhood guy whose veneer impressed my folks, while I stood exasperated. How many of those words am I using now? How many of those words are being used by native English speakers? And I'm told that I should use simple language while writing emails to clients. If I use the words appoggiatura or roscian, the client will simply take the business elsewhere, because I am hard to communicate with. The fact is, for effective communication you just need around 1000 words in any language. The rest are for poets and dictionary makers who just want to intimidate us ordinary folks and to float themselves in business :-)

Sanskritam students often wonder what is the best way to type in Sanskritam. There are a few great tools -
Baraha, Google Transliterator etc. There are also a few english transliteration schemes available for the devanagari script. Which one to choose? I feel each of them try and have their own quirks. Lets examine some of those:

The Harvard-Kyoto is mostly used in academic circles. The most jarring usage in this scheme is that of uppercases for anunasikas (G/
ङ्, J/ञ्) and z for fricative . Try however, I could not pronounce z as while reading. Just like the Tamil zh/ doesn't make any sense, but we just adapted it.

IAST is more confusing with multiple latin alphabets for the same devanagari akshara (k, K -> , kh, Kh -> ). Do you ever substitute one letter for another in English? Then why such karakter overdoz for transliterashuns?

ITRANS is even crazier. Doubled or uppercases (aa vs A), exponent symbol (R^i), doubled uppercases followed by lowercases (LLi), tilde before, power after, dot before, dot after - its hard to write in ITRANS unless you are already a mathematical genius. It could have also used integral and sum symbols - one could learn advanced calculus simultaneously while writing saMskR^itam.

The other side of the text is parsing into comprehensible data structures. Writing parse routines for such schemes involve quite a bit of effort. So the question is, can there be a simpler, consistent scheme, with fewer things to remember while typing?

Here is one such minimalistic transliteration scheme and it uses only 4 main rules and 3 corollaries:

Rules

1. lowercase 2. uppercase 3. possible dot after a letter 4. semi-colon for visarga; jihvamuliya; upadhmaaniya

Corollaries

1. uppercase for dIrgha
2. krama -> lowercase; lowercase+dot; uppercase; uppercase+dot
3. 1:1 mapping in the transliteration

a A e E u U r. R. l. E. I O O.

k k. g g. n
c c.j j. n.
t t. d d. N
T T. D D. N.
p p. b b. m
y r l v s s. S h
M :
a. (avasarga)

The principles behind the scheme are:
  1. A devanagari letter is either ONE char or ONE char + dot.
  2. The scheme is almost consistent (lowercase, lowercase+dot, uppercase, uppercase+dot, uppercase for dIrgha svara). This is why we get an orderly (n, n., N, N.)
Parsing or tokenizing such a scheme is also easy. There is only one if condition to determine an akshara - if a dot follows letter, they form one akshara, if not, that letter is the akshara.

In other words, converting a sentence into unicode is just one-line, assuming the unicodeMap[:] is defined already.

def toUnicode(s,i => return this.append(unicodeMap[s.charAt(i+1) == '.' ? s.substring(i,i+2) s.substring(i,i+1)]);

Q & A


Q. Why favor the 1:1 mapping? After all isn't it good to have options and easy to write in different ways?
A. It sure is, but do you really substitute one letter for another letter in English or Sanskritam? Can you say "sha" is close enough to "sa" and use it for "savam"? A 1:1 mapping allows no options and eventually it gets streamlined in a person's thought process.

Q. viDE.he -> isnt this hard to read than vaidehi ?
A. Yes it is, no doubt. But how did you learn to pronounce it as "vaidehi" with "d" as
while the most common pronunciation of d is ? It will be hard read at first, but didnt we all start with guessing what a dot, dash, tilde, apos mean on top and bottom of the letters :-). If one is already used to a scheme the mind resists a change.

Q. So whats really easy about this scheme?
A. If you are well versed with the devanagari krama, it is probably easier to apply the transliteration. Also there is only the extra character dot to remember.

Q. Whats really difficult, then?
A. Reading will be unnatural for those who are used to different scheme (eg nAraayaNa vs N.ArAyaNa). We are so much used to N being murdha nakara! But even the existing schemes were originally unnatural, so the unnaturalness just shifts to a different section.


(Note: This article is not to challenge existing schemes, but to show a minimalistic scheme ie a scheme with minimum fuss is possible from a parsing perspective. In any art form, the prevelance rules over theories). 

Friday, August 19, 2011

Bodhi learns Sanskrit

बोधि: संस्कृतं पठति

Amar Chitra Katha. The printed granny of Indian kids. That and its cousin Tinkle have been a source stories for a few decades now with a lot of nostalgia for my generation. From puranic to folktales to heroes, it has remained a valuable source of stories in compact format. I wish ACK brings out a full edition of Kathaa-sarit-saagara and similar Sanskrit works, which contains some great stories. Obviously there are already some stories published from it, but not a complete edition.

Though ACK/Tinkle has a lot of stories based on Sanskrita literature, rarely there are stories with Sanskritam itself as the theme.

The below is a translation of a funny story "Bodhi learns Sanskrit".




कस्यांश्चित् पाठशालायाम् बोधि: संस्कृतं पठति । अध्यापक: सर्वान् छात्रान् पाठयति ॥

अध्यापक: वदन्तु । ... राम: ... रामौ ... रामाः ॥
(छात्राः पुनरुच्छारयन्ति)
अध्यापक: ... रामं ... रामौ ... रामान् ॥
(छात्राः पुनरुच्छारयन्ति)

....

अध्यापक: अथ अङ्ग-पाठः । ... मम शिरः । (स्वशिरः स्पृशन्) ... मम शिरः ॥
छात्राः ... मम शिरः । ... मम शिरः ॥

तदनन्तरम् सर्वे स्वगृहम् गच्छन्ति । बोधि: अपि गृहम् गत्वा अभ्यासम् करोति ॥

बोधि: मम शिरः ... । मम शिरः ... । मम शिरः ... । इत्युक्ते अध्यापकस्य शिरः ॥

(बोधेः पिता तत् श्रुण्वन् तत्र आगच्छति । कोपेन तर्जयति)

पिता: अरे मूढ ! एतादृश: वा भवतः संस्कृत पठन लक्षनम् ? ’मम शिरः’ इत्युक्ते अध्यापकस्य शिरः न । (स्वशिरः स्पृशन्) मम शिरः । (पुन: स्पृशन् उच्चैः) मम शिरः

(पिता उक्त्वा निर्गच्छति)

बोधि: (किञ्चित् कालं चिन्तयित्वा) । ... आ ... आ ... आम् । इदानीम् अवगतम् । पाठशालयाम् ’मम शिरः’ इत्युक्ते अध्यापक: शिरः । गृहे ’मम शिरः’ इत्युक्ते पितु: शिरः । अवगतम् । अवगतम् ॥

Tuesday, August 9, 2011

Six degrees of Sutras

One of the memorable humour tracks of Tamil movies is from arivAli (The Intelligent) made in early 1960s. A black white film, with simple clean humour. Very likely such movies were made in other Indian languages too. The husband asks his wife to make puri. Wife does not know how to make it, so husband gives instructions to her:

Husband: Take a vessel and put some wheat flour in it
Wife: Yeah, I know that!
Husband: Pour some water and salt and mix with flour
Wife: Yeah, I know that!
Husband: Make it into small round balls
Wife: Yeah, I know that!
Husband: Flatten it and make it round like appalam
Wife: Yeah, I know that!
Husband: Then what?
Wife: Aah, I dont know that...


In an earlier post we saw how a sutram's definition can be applied to naming convention of variables in modern programming context. A natural question arises - what are the types of sutram-s? As I have mentioned before, classification is ancient Indians' forte. There are six types of sutram-s which is defined, yeah you guessed it, in a shloka:

संज्ञा च परिभाषा च विधि: नियम एव च ।
अतिदेशो अधिकारश्च षड्विधम् सूत्र-लक्षणम् ॥


A lot has been described about this classification in wiki articles and elsewhere. Here we will just look at the parallels between this classification and programming concepts.
संज्ञा - definition; परिभाषा - interpretation; विधि- rules; नियम - restriction; अतिदेश- extension; अधिकार - header/domain.

संज्ञा is a definition sutram. It gives a meaningful name for one or more symbols. Do you remember your early programming days where you are told that a good practice is to give a name for constants? (Do you still follow that?)
 

वृद्धिरादैच् |  [1.1.1]  {a, e, o} are called vRuddhi |
 

In programming terms, this is like the classic C #define statement or final static/const statements in Java/C# world. 

#define PI 3.14

#define IK_PRATYAHARA {i, u, Ru, Lu}


परिभाषा sutram is an interpretation or meta-rules sutram. Its function is to tell how to interpret other sutram-s.
      
तस्मात् इति उत्तरस्य | [1.1.67] When a sutra has a word in panchamI vibhakti, then the word next to it will undergo some modification. 


A very good equivalent to interpretation rules is Annotations. This example will make it clear:


@WebMethod(POST)
public void updateUser() {
}


The annotation says that the method updateUser() must respond to only a http post call. It helps the runtime interpret the method in a certain way.

विधि sutram: What is the fundamental difference between a calculator and a computer? The calculator (a regular) deals with numbers only, while the computer can make logical decisions.

विधि sutram is the classic if-else-condition. In fact, the way पाणिनि has applied it - its closer to aspect-based rules than just a vanilla if-else condition. There is a slight difference between "if do this" and "when apply this". While the if-condition has to be encountered during the execution process thread, a when-rule applies anytime during the happening of a certain condition. "if" is thread-execution based, while "when" is time based, a trigger with an "if".पाणिनि defines several rules in his अष्टाध्यायी. For example there is a famous sandhi rule with just three words, that covers a matrix of sandhi-s.
       
इको यण् अचि | [6.1.78]
When a vowel follows, the letters i, u, Ru, Lu change to y v r l.

Obviously any if-else condition would be a vidhi sutram. If we were to follow paaNini's technique, we would not write it as a simple if condition. Instead we would define all "when rules" in a separate section of the code, and provide aspects of applying. When code executes, the aspects will monitor the code and "apply" a rule when those conditions are satisfied. For eg the (now) Oracle's Haley is one of the very popular Rules Engine products, in which Rules can be defined in simple English.

नियम sutram
If programming was based on socialistic ideologies, rules would apply uniformly to all cases (except the political class, of course). But reality is somewhat capitalistic, so there is always some case which disagrees to agree. Remember the business requirements like "should apply to all but one or two cases" and the complex if-conditions you would have to write to handle just the boundaries?

Restriction is not Exception. Exception is an alternative flow, but restriction is about applying a rule to a lesser number of or rarer cases. In general, it is achieved by if-else conditions, but doesn't always have to be so.

अतिदेश sutram-s are extensions. They qualify a pre-existing rule with another property, not originally possessed.
   
Using pratyaya-s vat and mat, paaNini extends the behaviour of one rule and applies to another rule.
       
लोट: लङ्वत् | [] would mean "लोट् vibhakti-s are to be conjugated just like लङ्"
       
Imagine that
लोट्, लङ् etc implement an interface called "ल". Panini's technique essentially casts लोट् as लङ्.
       
Class extensions are very common in OO languages. Using the Extension mechanism offered by Ruby, C# etc, one can improve the functionality of an existing class. For eg, String class does not have intrinsic methods to check if the value is null or empty . One could add a IsNullOrEmpty(this String) method and operate directly on a string instead of a new helper class.
       
It does not end there. Imagine two objects User and LockedUser:
       
User user; //normal user behaviour
LockedUser lockedUser; //a locked user behaviour, imagine a Trait (see Scala which allows partially implemented interfaces)
if (user.failedAttempts(3)) { user.setBehaviour(lockedUser)); } //user now behaves like a lockedUser

       
Instead of changing the state of the object (commonly via isLocked() or status = LOCKED), the behaviour of the object itself is changed. Upon certain conditions, the regular user adapts the behaviour of a locked user. We do not directly deal with properties and state changes, instead work with behaviour changes. For eg, Scala offers Traits, with partially implementable methods, which are much more than interfaces. In a way you are describing or coding to the behaviour of an object rather than the states.
       
अधिकार sutram: If you have done database modeling, you know Subject Areas. In programming terms, think of package, namespace etc. They all define a domain underwhich certain rules/classes/tables are grouped. Thats exactly what adhikara sutram is. More information on adhikara sutram can be found in this post.
   
संहितायाम् | []  "During the closeness of words"
प्रत्यय: | [3.1.1] "Affix"
       
package com.microsoft.office;
namespace Com.Sun.Oracle.Java;
//ok, I was being facetious :-)


Apart from these core sutram-s there are a few more.

निषेध sutram-s are negation rules of other rules. While niyama sutram-s are positive restrictions, nishedha can be seen as negative orientation.
   
हलन्त्यम् | (consonant endings are it-markers)
न विभक्तौ तुस्माः | (but
letters t, th, d, dh, s, m are not it-markers if it is used for conjugations)
       
In programming, we can come up with a validation rules engine like this:
       
1. All Address Fields Required
2. Not if Address Line 2
       
Imagine the simplicity of a program using such a validation engine!
       
विभाषा sutram is an optional rule. For eg think of the sentence "I would like to goto a movie". You can also say it as "I'd like to goto a movie". The shortening of "I would" to I'd does not change the meaning, and is optional to use. पाणिनि uses this technique to provide alternative usages of words and grammar by using the shabda-s vaa, vibhaaShaa, anyatarasyaam. Optional rules are very common and are done using if conditions.

Besides these, some of the paaNini's techniques also strike a chord with modern techniques. For eg, there is an interpretation sutra विप्रतिषेधे परम् कार्यम्, which means "In case of rule-conflicts, the latter rule prevails". Virtual override feature?

Another unique technique is called स्थानी भाव where a substituting suffix can retain the characteristics of a substituted suffix. Heard of Liskov Substitution Principle? Yeah, something like that.

Panini also uses recursive techniques for some of the rule operations. We will see that in a subsequent post.


Yet another ingenious technique is seen in the last 3 pada-s of अष्टाध्यायी where every previous rule is oblivious to all the latter rule. The rules are arranged in such a way that every rule "thinks" that it is the last rule of the book.


So what is the benefit of comparing modern programming with a 2500+ old text book of grammar? Let it be पाणिनि:, Capellini or Linguini founded algorithms. As a software engineer, what do I care? I do not have an answer. After all, a programmer consultant in US writes a for-loop for $75 an hour, while the same for loop is written by some one in China for a few Yen. Can you judge which for-loop is better?

If several of modern programming concepts have indeed parallels in अष्टाध्यायी, how about some concepts in अष्टाध्यायी not yet formulated into modern programming theories? What if they could create a fundamental change in theory of programming?

What would पाणिनि: think about modern programming concepts? To find out, we shall send Donald Knuth back in time as our representative.

Knuth: Programming is about definitions, rules and algorithms.
पाणिनि: आम्, जानाम्येव !
Knuth: Using algorithms we derive and solve various equations.
पाणिनि: आम्, जानाम्येव !
Knuth: In object oriented programming, we do abstraction, polymorphism and other cool things.
पाणिनि: आम्, जानाम्येव !

Knuth: With functional programming, we define functions, recursions and closures.
पाणिनि: आम्, जानाम्येव !
Knuth: Then we create programs to play games like Grand Theft Auto all day long.
पाणिनि: अहो ! तदहम् न जानामि !!