Saturday, January 7, 2012

Art of complexity

In the Journal of Indian Mathematical Society, Srinivasa Ramanujan posed a few very interesting puzzles. One of the first problems he posed and a very intriguing one belongs to  recursive category.

Upon persuasion, Ramanujan himself gave the answer: 3

Most beginner students are put off by the complexity of Sanskritam language. The way Sanskrit is taught in schools and the text books that start off with all conjugation combinatorics are mostly to blame. Complexity probably does not daunt them when the students opt for French, German or Spanish as their "second" language. When studying these languages, the students start out with a tabula-rasa, if I may borrow from Immanuel Kant. They are not exposed to the literature there of right away. But in Sanskritam, in our everyday life we are already exposed to several subhashitam-s, sloka-s and stotram-s. There are literally tens of thousands them, not to mention the maha kavyas and other works. Obviously one cannot know them all. But the sheer volume can intimidate the students right away. Apart from that, there are the pre-concieved notions and biases about Samskritam. Every Indian state, region and individual have an opinion on Samskritam.

During the Austin Yamunotrii 2011 Samskritam Family camp , Smt Sharada Varadarajan mahodayaa gave an interesting speech about how a same idea can be represented in simple or complex language.

"Boy eats food"

can be translated as "baala: annam khaadati", which is simple enough to understand. (a-karanta: + a pan-Indian shabda 'annam' + a parasmaipada verb)

Or it can be translated as "shishu: odanam ashnute" (u-kaaranta + rice-food + a non-inituitive aatmanepada verb).

Coming to think of expressing ideas in complex terms, why was complexity deliberatey favored by Sanskritam poets? The Darwinian evolution of intellect does not seem to apply to the Indian experience. In the Indian lore, the ancients were always considered people of vastly superior intellect. I am not talking about the divine beings, but the maharshis, muni-s, siddha-s, poets - ordinary people who elevated themselves to a much higher level of consciousness. If evolution means man becomes more intellectual, how could the ancient Indians think in complex terms, thousands of years earlier? What was the need for creating complex tongue-twisting shlokas, stotram-s, stuti-s or those that would read meaningfully back-to-front, while a simple "namami", "vande", "nama:" would suffice? Does the "phala" of shloka depend on the complexity of the recited ?

The reason is the same as George Mallory gives on why would somebody go through a difficulty of climbing a mountain: "Because its there."

Why would the ancient poets conjure up some of the most complex and intricate shlokas, chandas and poetry? Just because Sanskritam allowed them to.

The structure of the language let them run amok, at times wildly, in the forest of intellectualness. The fluidity of the language let them soar their imaginations in all directions without compromising the school-teacher like strictness of the grammar. The richness of the language yielded the fruit of satisfaction, that in turn enriched the language like a feedback amplifier.

Complexity can also be humorous: Harshavardhana's Naishadiyacaritam is supposed to be so complex, that Harsha himself rewrote it a few times to make it simpler.

A poet asks Harsha: kim cikIrShasi ? [What do you want to eat?]
Harsha: shemuShImuShimAShamUShe [I want to eat urad dal to become dull]

Harsha's mind was probably genetically wired to think in complex terms. He wanted to eat urad dal to make him dull, to think in simpler terms. But he could not express his wish in a simple way! Urad dal is supposed to make one dull of intellect. Harsha wanted that to make him dull to think in simpler terms!

Like it or not, the Ramanujan's formula is beautiful and the recursiveness mesmerizing. We may dislike complexity, run away from it or curse it, but it is out there. If we learn to appreciate complexity, it may not daunt us no more.

Friday, December 16, 2011

The relativity theory of superiority

Everybody knows that Hanuman is very humble and always considered him only a daasa of Rama. But how humble was he? Can his humbleness be mistaken for weakness or lack of confidence?

Valmiki establishes the character of Hanuman in such a way that whatever action he took, that was possibly indeed the only best course of action. Throughout the course of sundara kanda, Hanuman makes numerous instant decisions that lead the story from one thrilling frame to another. The whole sundara kanda, frame-to-frame, is about how to take right decisions at the right time.

One such scene is when Hanuman takes departure from Sita. A simplistic retell of the story is just that Hanuman gets the cUDAmaNi from Sita and promises her that he will be back with Rama to free her. But in the details, Valmiki establishes a very deep analysis of human emotions in sorrow. Every dialog of which is something we can relate to easily in our lives.

In foreign lands, especially in remote towns, desis easily get excited by seeing one of their own. One may be from a remote village in South and another from Punjab. But in a foreign town of ten desis, language, religion and other barriers are forgotten and everyone instantly feels 'bhai-bhai'. After a long time, Sita has finally met someone who is acquainted with her husband. She naturally pleads Hanuman to stay secretly nearby for one more day, for it will console her to have someone nearby in the dreaded ashoka vana. Inspite of knowing Hanuman's exploits, she then carefully places a doubt whether Sugriva and his army of monkeys are capable of crossing the ocean and taking on the might of Ravana.

A very tricky question indeed. He has just crossed the ocean after a long journey, seen how huge, glorious yet intimidating Lanka is. He knows Rama is superior, he believes in his own strength, but he does not know the full strength of Ravana and his company yet. There is no indication he had given thought about how other monkeys can cross the ocean. Hanuman cannot say something like 'Yeah ma'm, we will try our best'. He has to completely reassure Sita of the capabilities.

The sequence of arguments he puts forth is very logical. He first mentions the leader, Sugriva is completely committed to the cause (of freeing her). Then he mentions the number of monkeys in his army (thousands of crores of monkeys). That sheer number should give confidence to Sita. Then he mentions their overall qualities - power, perseverance and loyalty to Sugriva. He finally puts forward a statement thats unbelievably astonishing, that silences any doubt of not only Sita, but even the reader.

मत् विशिष्टाः च तुल्याः च सन्ति तत्र वनौसः ।
मत्तः प्रत्यवरः कश्चिन् न अस्ति सुग्रीव संनिधौ ||

mat vishiShTaaH ca tulyaaH ca santi tatra vanausaH |
mattaH pratyavaraH kashcin na asti sugrIva saMnidhau ||

अहम् तावत् इह प्राप्तः किम् पुनः ते महाबलाः ।
न हि प्रकृष्टाः प्रेष्यन्ते प्रेष्यन्ते हि इतरे जनाः ॥

aham taavat iha praaptaH kim punaH te mahaabalaaH |
na hi prakRuShTaaH preShyante preShyante hi itare janaaH ||

All other monkeys are either equal to me or above me in valor. There is no one inferior to me in Sugriva's army. If I could come here in one leap, what to say about others? Superiors are not sent for errands, only the inferiors are, isn't it?

Hanuman was hand picked for this task by none other than the great Jambavan, further entrusted personally by Rama, purely on the basis of his most superior ablities. Yet Hanuman turns it around completely the other way, quoting that only inferior people are sent for errands, which for a moment makes perfect sense to console Sita. He does not say he is the most inferior - that would have shown lack of confidence. He says there is no one else inferior than him. He is humble, yet there is absolutely no lack of confidence.

Simply put, there is no other statement that would have quelled the doubt of Sita, than how Hanuman put it.

Thursday, December 15, 2011

The power of dsl

When I started the samskritam-dsl project, one of my goals was to develop a domain-specific language that will allow to interact with the system using a natural syntax. Not just a programmer, but a samskritam pundit with a basic computer knowledge should be able to interact with the system. While that interaction is still a long way, I feel there are some foundations in place. The sandhi-s as described in ashtaadhyayI are very arithmetic like, yielding consistent results, so a program model should be able to simulate that.

For eg

assert 'a' + 'a' == 'A' //akaH savarNe dIrghaH sandhi


instead of

assert new Varna('a').add(new Varna('a').equals(new Varna('A')) == true


The former is obviously easier to understand as it follows mathematical conventions, instead of being bound by rigid programmatic syntaxes.

Static programming languages like Java just do not allow this kind of flexibility. But dynamic languages like Groovy/Ruby/Scala having special features for creating such constructs.

Let's go a step further than the above simple sandhi:

assert 'bhU' + 'sap' + 'tiG' == 'bhavati'  //bhU dhatu, kartari sap, prathama, parasmai-padam, eka vacanam, prathama purushaH


Is the above possible?

Before we answer that, does the above assert statement make sense at all? How do you know, or how do you make the computer know, that bhU is the dhaatu, sap is the vikaraNa pratyayaH, and tiG is the padam pratyayaH ? They are all straight strings and not objects. How do you tell the difference? Now lets say we rewrite the above assert slightly differently --

assert 'bhU'.dhatu + 'sap'.pratyaya + 'tiG'.pratyaya == 'bhavati'


Now, that reads better! A simple domain knowledge is now converted into a programmatic construct, yet it almost reads just like samskritam! As you read the sentence, it seems very natural. 

That, mon ami, er, mama mitrANi, is the power of dsl.

Here are the implementation details:

@Category(String)
class Samskritam {
  def getPratyaya() { return new Pratyaya(this) }
  def getDhatu() { return new Dhatu(this) }
  }
}

class Dhatu {
  def name
  
  Dhatu(String _name) { name = _name }
  
  @Override
  String plus(Pratyaya p) { name + ' ' + p.toString() }
  
  @Override
  String toString() { name }
}

class Pratyaya {
  String value
  String realValue
  
  Pratyaya(String _value) {
    value = _value
    realValue = value?.tasya_lopaH()
  }
  
  @Override
  String toString() { realValue }
}

So what sutram-s are involved here? Lets skim through the important ones -

bhU + shap -> p gets dropped due to halantyam; sh gets dropped due to lashaku ataddhite
bhU + a -> bhO + a; due to aad guNaH
bhO + a -> bhav; due to eco ayavaayaavaH
tiG -> the G gets dropped due to halantyam
bhav + a + ti -> bhavati
 
Here is the complete test case:

use (ch8.lang.Samskritam) {
  def a1 = 'bhU'.dhatu + 'shap'.pratyaya + 'tiG'.pratyaya
  def a2 = sandhi.apply(a1, sandhi.aad_guNaH)
  def padam = sandhi.apply(a2, sandhi.eco_ayavaayaavaH)
  assert padam == 'bhavati'
}
The sandhi methods aad_guNaH and eco_ayavaayaavaH also obediently follow Paninian technique of replace the varnas using respective rules. 

Once again, the full code is in github.com/vasya10/samskritam

Sunday, December 11, 2011

sthAne antaratamaH

A Functional Approach

It is very interesting to observe that Panini's approach to sandhi-s is thoroughly function oriented and very algebraic in execution. Sandhi-s in Samskritam are profound sophistications. We shall see how functional programming naturally fits into Panini's approach. Once again I'm not describing theories, but real testable executable code and assertable outputs.

For the samskritam-dsl project, I was trying to come up with a simplistic representation of sandhi-s. The famous iko yan aci sandhi belongs to the purva rupa sandhi (the first word gets modified) and Panini's general technique can be functionally written as:

f = sandhi(sthana, adesha, vidhi, purva, para)

This is understood as:  "The adesha (substitute) replaces the sthana (substituted) of the purva shabda when vidhi is true on the para shabda"

Panini uses natural linguistic approach and places sthana in 6th vibhakti, adesha in 1st vibhakti and vidhi in 7th vibhakti.

So lets rewrite this in functional programming notation -

  //purva rupa sandhi sutra - closure
  def purvaRupaSandhi = {sthana, adesha, condition, words ->
    def (purva, para) = words.tokenize()
    if (condition(para)) {
      def k = sthana.substitute(adesha, purva.lastVarna())
      purva.replaceLast(k)
      purva + para
    } else {
      words
    }
  }

Here sthana is the list of substituted varnas (eg ik), adesha is the substitute varna (yaN). Condition is not a list but something that evaluates to true or false based on the para shabda. So its a closure again.

  def aci = { word -> word.varnas()[0].svara() }
  def jshi = { word -> word.varnas()[0] in sivaSutra.jsh }
  

Basically the above closure evaluates to true if the first varna of the shabda is a svara.

Remember we defined the purvaRupaSandhi as a completely generic closure. It will take any sthana, adesha and condition. How do we apply this to iko yaN aci or say, jhalaam jash jhashi?

Groovy provides the "currying" feature --

  def iko_yaN_aci = purvaRupaSandhi.curry(sivaSutra.ik, sivaSutra.yN, aci)
  def JalAm_jash_Jashi = purvaRupaSandhi.curry(sivaSutra.Jl, sivaSutra.js, jshi)

Real functional programming stuff here. aci and jshi are closures/functions which are passed to purvaRupaSandhiwhich evaluates it based on the para shabda.

Now for some more beauty. How do we substitute sthana with adesha? Panini defines a really brilliant sutra 'sthAne antaratamaH' - the closest phoneme of the adesha to the sthana, must substitute the sthana. Now how do we find the closest ? When pronouncing it is easy to realize that i is close to y or u is close to v. But how does that translate to programming terms? This is when it dawns upon us that Panini's approach is extremely methodical.

Recall the definitions of phoneme-sets?

akUhavisarjaniyanaam kanTaH | icuyashaanaam taalu | RturaShaaNaam mUrhda | Ltulasaanaam danta | upUpadhmaanIyaanaam oshta |

So if a varna from sthAna exists in one of the above sets of phonemes, then the common varna between that phoneme-set and the adesha will substitute the sthana !!!

Illustration: jhalaam_jas_jashi

example: ap + dhi

purva shabda - ap (varna is p), para shabda - dhi, satisifies condition jashi (dha). Good.

sthAna - p, adesha will be one of jash = [j, b, g, d, dh] -- now which to pick?

p exists in oshta phoneme-set; so we have to pick one from pu = [p, ph, b, bh, m] - but which one?

Remember jash adesha is [j, b, g, d, dh]. So intersect oshta with adesha, which will result in b -> that substitutes 'p' - resulting in abdhi !

Can you now appreciate the functional beauty of maheshvara sutra?

So the formula is
  
  def ku = [k, kh, g, gh, ~n]
  def kanta = ['a', 'A'] + ku + ['h'] + [':']
    Closure sthaneAntaratamaH = { x, adesha ->
    for (def phonemeSet : [kanta, taalu, murdha, danta, oshta, dantoshtam]) {
      if (x in phonemeSet) return phonemeSet.intersect(adesha)
    }
  }
 }

That's it!! The 2 closures basically take care of any purva rupa sandhi rule! Only the additional conditional closures (aci, jashi etc.) must be provided as required.

Finally we verify

assert iko_yan_aci("iti api") == "ityapi"
assert jhalaam_jash_jashi("ap dhi") == "abdhi"

As usual the code can be found at github.com/vasya10/samskritam.

PS: Post this blog, I tried to verify purvaRupaSandhi closure by adding another rule "stoH shcuna shcu:" and it worked just fine. But there is a catch. The shcuna in the rule is not in 7th vibhakti, but is in instrumental case. And there is a reason why Panini uses 3rd vibhakti. The 7th vibhakti indicates the rule to be applied with respect to what follows, while the 3rd vibhakti implies just the contact of two varna-s are sufficient to produce the sandhi. The closure does not take care of that yet.

The functional programming approach of Panini makes me believe that universities should include ashTAdhyAyI as a course in Computer Science, instead of just being a research subject to elite academicians.

Tuesday, November 29, 2011

Chapter 8 DSL in GitHub

The chapter 8 dsl project is now available at github at https://github.com/vasya10/samskritam. It is just in very early stages and I will be updating whenever I get some free time.

Thursday, November 17, 2011

The Chapter 8 DSL

Domain Specific Language is fast becoming a popular way to describe a problem or a solution for a specific domain. The quality and readability of code using DSL is magnitudes above the regular "technical" code (using Java/C# for eg). Since information about DSL can be googled amply, I am not going to spend time writing on what a DSL is.

In many of the previous posts, I had used pseudo-code, demonstrating parallels in programming to Panini's techniques. Time to call the bluff now. Presented below is a seriously tested code. Here is a DSL that closely models some basic techniques of ashtaadhyaayI, specifically the maheshvara-sutra-s and those darning "it" rules. I'm using Groovy for the implementation, as I feel that it's syntax is more natural to read than that of Scala or Ruby.

Let's define some classes.

[Listing 1: SivaSutra.groovy]
package ch8

import java.util.List

/**
 * Implementation of Maheshvara Sutra using SimpleScript transliteration scheme
 * The table itself can be moved to a groovy configuration file to allow a different scheme like HK, ITRANS or AST
 * 
 * @author vsrinivasan
 */
@Singleton
class SivaSutra  {

  //siva-sutraani
  List table =
  [
    ['a', 'e', 'u', 'N'],
    ['r.', 'l.', 'k'],
    ['E.', 'o', 'n'],
    ['i', 'O.', 'c'],
    ['h', 'y', 'v', 'r', 't'],
    ['l', 'N'],
    ['n.', 'm', 'n', 'N', 'N.', 'm'],
    ['J', 'B', 'n.'],
    ['G', 'D', 'D.', 's.'],
    ['j', 'b', 'g', 'd', 'd.', 's'],
    ['K', 'P', 'C', 'T', 'T.', 'c', 't', 't.', 'v'],
    ['k', 'p', 'y'],
    ['s', 's.', 'S', 'r'],
    ['h', 'l']
  ]

  List list = table.flatten()

  int indexOf(String varna) { list.indexOf(varna) }

  @Override
  Iterator iterator() { list.iterator() }

  //eShaam antyaaH it
  List itMarkers = table.collect { it.last() }

  /**
   * is this iT-marker?
   * this finds only 'pratyahara iT' is defined, for other it-s see ItRules.groovy
   * 
   * @see ItRules
   */
  boolean isIt(f) { itMarkers.contains(f) }

  /**
   * expands a given pratyahara, including all the iT-s
   * not for practical purposes, but good for testing
   * 
   * @param pratyahara
   * @return
   */
  List expand(String pratyahara) {
    def (begin, end) = pratyahara.varnas()
    list[begin..end]
  }

  /**
   * returns the real pratyahara varna-s, excluding the intermediate it-markers
   * very procedural implementation, need to make it groovy-like
   * 
   * @param pratyahara
   * @return
   */
  List collect(String pratyahara) {
    def (begin, end) = pratyahara.varnas()
    boolean start = false
    def result = []

    table.each { line ->
      line.each { item ->
        if (item == begin || start) {
          if (item != line.last()) {
            result << item
            start = true
          }
          if (item == end && item == line.last()) {
            start = false
          }
        }
      }
    }
    return result
  }
}


[Listing 2: ItRules.groovy]
package ch8

@Singleton
class ItRules {

  //#(1.3.2) upadeshe ajanunaasika iT, anunAsika-s are denoted by a "-" at the end,
  //  may be M would be a better option?
  def ajanunasika = 'aAeEuUr.R.l.E.IOO.'.varnas().collect { it + "-" }

  //cutU
  def cu = 'cCjJn.'.varnas()
  def tu = 'tTdDN'.varnas()

  //s.asca, denoting as "sha" for convenience
  def sha = ['s.']

  //lasaku (ataddhite)
  def ku = 'kKgGn'.varnas()
  def lasaku = 'ls'.varnas() + ku

  //some more to be defined

  //#(1.3.3) halantyam - check if the last char is hal
  SivaSutra sivaSutra = SivaSutra.instance
  boolean hasHalantyam(String pratyaya) { pratyaya.varnas().last() in sivaSutra.hl }

  //allItMarkers except hal, which is applicable only to last letter
  def allItMarkers = ajanunasika + cu + tu + lashaku

  boolean isAnunasika(String varna) { varna.endsWith('-') }

  boolean isItMarker(String varna) { varna in allItMarkers }

  String tasyaLopah(String pratyaya) { (pratyaya.halantyam().varnas() - allItMarkers).join() }
}

[Listing 3: Main.groovy]
package ch8.tests

import java.util.List

import ch8.ItRules
import ch8.SivaSutra
import ch8.schemes.SimpleScriptScheme
import ch8.Samjna

/*
 * DSL: varnas() closure - tokenize the script into individual varnas (list)
 */
String.metaClass.varnas = {
  new SimpleScriptScheme().tokenize(delegate)
}

/*
 * DSL: halantyam() closure - remove the last hal iT and return the modified String
 */
String.metaClass.halantyam = {
  ItRules itRules = ItRules.instance
  def varnas = delegate.varnas() as List
  if (itRules.hasHalantyam(delegate)) {
    varnas.remove(varnas.size()-1)
  }
  varnas.join()
}

/*
 * DSL: tasyaLopah() closure - remove all the it-markers from a pratyaya
 */
String.metaClass.tasyaLopah = {
  ItRules itRules = ItRules.instance
  itRules.tasyaLopah(delegate)
}

/*
 * DSL: Direct exposition of a pratyaya or a pratyahara!
 */
SivaSutra sivaSutra = SivaSutra.instance
sivaSutra.metaClass.getProperty = { String pratyahara ->
  def metaProperty = SivaSutra.metaClass.getMetaProperty(pratyahara)
  def result
  if(metaProperty) {
    //if there is an existing property invoke that
    result = metaProperty.getProperty(delegate)
  } else {
      //inspect the property and convert it to varnas
    //taparastatkaalasya rule; need to formulate in a better way
        if (pratyahara.endsWith('t.')) {
      result = (pratyahara - 't.').varnas()
    } else {
        result = sivaSutra.collect(pratyahara)
    }
  }
  result
}

void testSivaSutra() {
  SivaSutra sivaSutra = SivaSutra.instance

  sivaSutra.table.each { println it } //print the maheshvara sutrani
  println sivaSutra.list //print a flattened version of the maheshvara sutrani
  sivaSutra.each { println it } //another way to print flattened maheshvara sutrani
  println sivaSutra.itMarkers //print only the it markers

  assert sivaSutra.isIt('n.') //check if n. is an it marker

  assert sivaSutra.expand('ak') == ['a','e','u','N','r.','l.','k'] //expand pratyahara including the it
  assert ['a', 'e', 'u']== sivaSutra.collect('ak') //pratyahara excluding iT
  assert ['a', 'e', 'u']== sivaSutra.ak //another way of getting the pratyahara! Meta-programming in play!
}

void testItRules() {
  ItRules itRules = ItRules.instance

  println itRules.ajanunasika //prints all the ac anunasikas
  assert "lyut".varnas() == ['l', 'y', 'u', 't']}


void testHalantyamRule() {
  //print the pratyahara-s after the halantyam rule applied
  ["kt.va", "Gan.", "kt.vat.", "sap", "lyu-t", "saN", "sat.r."].each { println it + " = " + it.halantyam() }

  assert 'kt.va' == 'kt.va'.halantyam()
  assert 'kt.va'  == 'kt.vat.'.halantyam()
  assert 'Ga' == 'Gan.'.halantyam()
}

void testTasyaLopahRule() {
  ["Gan.", "kt.vat.", "sap", "lyu-t", "saN", "satr."].each { println it + " = " + it.tasyaLopah() }

  assert 'a' == 'Gan.'.tasyaLopah()
  assert 't.va' == 'kt.vat.'.tasyaLopah()
}

void testSamjnaSutras() {
  SivaSutra sivaSutra = SivaSutra.instance
  
  def vruddhi = sivaSutra.'At.' + sivaSutra.ic
  def guna = sivaSutra.'at.' + sivaSutra.'E.n'

  assert ['A', 'i', 'O.'] == vruddhi
  assert ['a', 'E.', 'o'] == guna
}


testSivaSutra()
testItRules()
testHalantyamRule()
testTasyaLopahRule()
testSamjnaSutras()

[Listing 4: SimpleScriptScheme.groovy]
package ch8.schemes

/**
 * A simple script tokenizer
 * 
 * @author vsrinivasan
 */
class SimpleScriptScheme implements ScriptScheme {

  // hyphen denotes anunasika
  static List NotationMarkers = ['.', ':', '-']

  /** 
   * split/tokenize a given word into a list of varnas
   * the word could be a pada, shabda, pratyaya or pratyahara
   * needs to handle anunasika properly
   * 
   * @calledby String.metaClass.varnas()
   * @param word
   * @return list of varnas
   */
  @Override
  public List tokenize(String word) {
    def varnas = []
    word.eachWithIndex { c, i ->
      c = ((i < word.length()-1) ? ((word[i+1] in NotationMarkers) ? (c + word[i+1]) : c) : c)
      if (!(c in NotationMarkers)) varnas << c
    }

    varnas
  }
}


Now some observations and analysis:

  1. To do this in a regular Java/C# would require several objects, wrapper-classes and utility methods to be created. But using meta programming techniques and defining a clean DSL makes this a very interesting implementation.
  2. Ability to work directly on strings, lists and maps makes a huge difference, as opposed to wrappers around strings and creating objects like pratyahara, it, pratyaya etc.
  3. The Main.groovy is self-explanatory in what's given and what's expected. This is not pseudo-code anymore! Note the direct method invocation like varnas(), halantyam(), tasyaLopah() on Strings. And also observe the direct reference to a pratyaya (sivaSutra.ac will expand to a list of vowels). Metaprogramming, awesome or what?
  4. Also observe the testSamjnaSutras() definitions. The only reason I have to quote the properties is due to the usage of dot in the schema. A symbol-less scheme like AST would make a very readable code.
  5. The code uses the SimpleScript for devanagari transliteration. As I had mentioned in a previous post, parsing the script is trivial, because of a strict 1:1 mapping between English and Sanskritam letters. Took less than 5 minutes to write it.
  6. However the code allows to use any transliteration scheme, if one can come up with it, by implementing the ScriptScheme interface. Harvard-Kyoto, ITRANS or AST or even Unicode - as long as the individual varna-s are correctly tokenized, the program will work fine.
  7. Any script scheme can be supplied via a groovy configuration and read by ConfigSlurper!
Obviously this is just the very beginning and some areas are still unpolished. But imagine being able to write code like

assert "bhavati" = bhU + sap + tin //1st gana
assert "kasca" == 'ka:' + sca //scutva sandhi

Imagine being able to work out sandhis just by using the plus sign! (eg ) Wouldn't that be really really cool? And that's not really impossible. It will only take a little more effort to expand the DSL to include anga, guna, operator overriding for sandhi rules etc.!

Imagine similar DSL-s can be implemented for parsing shlokas to determine chandas! The potential for a Samskritam DSL is huge.

Tuesday, November 15, 2011

Read, Restore and so forth

My first sight of a computer was in 1983 in a remote town in India, the deity of the city is a representation of "Conscious-Ethereal Grand Cosmic Nothingness". Our science teacher somehow got hold of somebody who had a Commodore 64. About 40 students from our class (India was that less populous 30 years ago) walked about 5 kilometers on a rainy day to that computer guy's house. We were allowed in a batch of 10 into a room dimly lit and were seated on the floor. A girl, sitting on the chair, was holding a joystick (or a mouse?) and a keyboard and making a noisy typing sound. On the small monitor some rectangles and squares of different colors were jumping around. She was playing some game. She said something about BASIC and thats all we learnt.

Almost 10 years forward. It was the onset of the Russian winter, I was walking with a senior towards the university. He was a smart guy, everybody respected him and was always an A-grader.  We were talking about programming language theories. C++ was just getting popular. He said "Hey, I know Pascal and C. And this year we are learning some AI using Prolog. I've also been learning C++...". He paused. Then suddenly said, "You know BASIC right? Can you teach me that?". I didn't know what to respond, but just said "Sure". I was a bit confused but elated to 'teach' a senior. That opportunity never came though.

Current times. Studying Ashtadhyayi's several techniques which are an illuminating parallel to programming - there is one that is intriguing. It is the word "aadi" given in a context. When Panini wants to mention a group of information, he would just use the first value of the group and suffix it with "aadi" or "aadya:". The reader is obviously either expected to know the list by-heart or refer to it. No big deal, when the average Sanskrit student is expected to know amarakosha by-heart anyway. So the first value of the list itself is used as the "head" to reference the list. This way Panini feeds by a pointer to an array of data using a very simple technique.

A pseudo code may clarify:

/* The list of verbs called as dhaatu paatha */
static Map DHAATU_PAATHA = [bhU:sattaayaam, ... ]

/* pointer to the list of the dhaatu paatha; trying to mimick naturalness - intentionally not referring via the static variable but via the head-value of the list */
char *list_of_verbs = ["bhu"]

Look at some of the sutra-s -

bhUvAdayo dhaatavaH (1.3.1) | By this statement Panini refers to about 2000+ verbal roots in Sanskritam, starting with bhU
sanaadyantaa dhaatavaH (3.1.32) | Refers to the list of derivational roots, the list starts with a verb that ends with suffix "-san"
praadayaH | Refers to the 22 prepositions that start with "pra"

Obviously this technique of "aadi" reference is pretty common in Sanskritam and other Indian literature. Tyagaraja in his siddharanjani kriti naadatanumanisham says "sadyojaataadi pancha-vaktra" referring to the five faces of Shiva starting from sadyoja. Obviously one who is not aware of the details will not know what the rest are, but aadi is just what it is - a pointer to a list of information. If Panini was the one who invented it (lets assume for sake of argument, because Panini had predecessor grammarians too and there were obviously other literature before him), it is a brilliant technique. The technique is not perfect though, because overtime somebody could come up with a modified list with the head-value being the same. But still its a great way to abstract information where the uniqueness of the head-value serves as an emphasised indicator to the contents following it.

Back to programming after the detour. Even after several years in programming, BASIC continues to fascinate me. Given all kinds of high level languages, there is one feature I think I sorely miss from BASIC. It is the "READ...DATA" statement. The READ...DATA statement allows for feeding data to the program in the shortest possible way without having to assign random values individually.

10 FOR I = 1 TO 10: READ X(I): NEXT I
20 DATA 1,3,5,7,11,13,17,19,23,29
30 RESTORE 20

10 READ NAME$, PHONE$, PI, BASERADIX
20 DATA "James Bond", "555-1212", 3.14, 8
30 DATA "11/11/2011", "All the world's a Pre-Production."
50 READ DATE$, WS_QUOTE$

The DATA statement could be anywhere in the program and the READ statement would sequentially read-off the data, like popping off a stack. The RESTORE statement acts like just like the "aadi" of ashtadhyayi - it points to just the beginning of the data. The simplicity of the bootstrap data feed is appreciated when you do not care where the DATA is set. Several high level languages have been invented after that, but not many provide such an easy way to feed bootstrap data to the program variables. Of course there is enumerators and similar stuff, but somehow the simplicity of READ statement stands out. Just like Panini's aadi technique.