Tuesday, November 29, 2011

Chapter 8 DSL in GitHub

The chapter 8 dsl project is now available at github at https://github.com/vasya10/samskritam. It is just in very early stages and I will be updating whenever I get some free time.

Thursday, November 17, 2011

The Chapter 8 DSL

Domain Specific Language is fast becoming a popular way to describe a problem or a solution for a specific domain. The quality and readability of code using DSL is magnitudes above the regular "technical" code (using Java/C# for eg). Since information about DSL can be googled amply, I am not going to spend time writing on what a DSL is.

In many of the previous posts, I had used pseudo-code, demonstrating parallels in programming to Panini's techniques. Time to call the bluff now. Presented below is a seriously tested code. Here is a DSL that closely models some basic techniques of ashtaadhyaayI, specifically the maheshvara-sutra-s and those darning "it" rules. I'm using Groovy for the implementation, as I feel that it's syntax is more natural to read than that of Scala or Ruby.

Let's define some classes.

[Listing 1: SivaSutra.groovy]
package ch8

import java.util.List

/**
 * Implementation of Maheshvara Sutra using SimpleScript transliteration scheme
 * The table itself can be moved to a groovy configuration file to allow a different scheme like HK, ITRANS or AST
 * 
 * @author vsrinivasan
 */
@Singleton
class SivaSutra  {

  //siva-sutraani
  List table =
  [
    ['a', 'e', 'u', 'N'],
    ['r.', 'l.', 'k'],
    ['E.', 'o', 'n'],
    ['i', 'O.', 'c'],
    ['h', 'y', 'v', 'r', 't'],
    ['l', 'N'],
    ['n.', 'm', 'n', 'N', 'N.', 'm'],
    ['J', 'B', 'n.'],
    ['G', 'D', 'D.', 's.'],
    ['j', 'b', 'g', 'd', 'd.', 's'],
    ['K', 'P', 'C', 'T', 'T.', 'c', 't', 't.', 'v'],
    ['k', 'p', 'y'],
    ['s', 's.', 'S', 'r'],
    ['h', 'l']
  ]

  List list = table.flatten()

  int indexOf(String varna) { list.indexOf(varna) }

  @Override
  Iterator iterator() { list.iterator() }

  //eShaam antyaaH it
  List itMarkers = table.collect { it.last() }

  /**
   * is this iT-marker?
   * this finds only 'pratyahara iT' is defined, for other it-s see ItRules.groovy
   * 
   * @see ItRules
   */
  boolean isIt(f) { itMarkers.contains(f) }

  /**
   * expands a given pratyahara, including all the iT-s
   * not for practical purposes, but good for testing
   * 
   * @param pratyahara
   * @return
   */
  List expand(String pratyahara) {
    def (begin, end) = pratyahara.varnas()
    list[begin..end]
  }

  /**
   * returns the real pratyahara varna-s, excluding the intermediate it-markers
   * very procedural implementation, need to make it groovy-like
   * 
   * @param pratyahara
   * @return
   */
  List collect(String pratyahara) {
    def (begin, end) = pratyahara.varnas()
    boolean start = false
    def result = []

    table.each { line ->
      line.each { item ->
        if (item == begin || start) {
          if (item != line.last()) {
            result << item
            start = true
          }
          if (item == end && item == line.last()) {
            start = false
          }
        }
      }
    }
    return result
  }
}


[Listing 2: ItRules.groovy]
package ch8

@Singleton
class ItRules {

  //#(1.3.2) upadeshe ajanunaasika iT, anunAsika-s are denoted by a "-" at the end,
  //  may be M would be a better option?
  def ajanunasika = 'aAeEuUr.R.l.E.IOO.'.varnas().collect { it + "-" }

  //cutU
  def cu = 'cCjJn.'.varnas()
  def tu = 'tTdDN'.varnas()

  //s.asca, denoting as "sha" for convenience
  def sha = ['s.']

  //lasaku (ataddhite)
  def ku = 'kKgGn'.varnas()
  def lasaku = 'ls'.varnas() + ku

  //some more to be defined

  //#(1.3.3) halantyam - check if the last char is hal
  SivaSutra sivaSutra = SivaSutra.instance
  boolean hasHalantyam(String pratyaya) { pratyaya.varnas().last() in sivaSutra.hl }

  //allItMarkers except hal, which is applicable only to last letter
  def allItMarkers = ajanunasika + cu + tu + lashaku

  boolean isAnunasika(String varna) { varna.endsWith('-') }

  boolean isItMarker(String varna) { varna in allItMarkers }

  String tasyaLopah(String pratyaya) { (pratyaya.halantyam().varnas() - allItMarkers).join() }
}

[Listing 3: Main.groovy]
package ch8.tests

import java.util.List

import ch8.ItRules
import ch8.SivaSutra
import ch8.schemes.SimpleScriptScheme
import ch8.Samjna

/*
 * DSL: varnas() closure - tokenize the script into individual varnas (list)
 */
String.metaClass.varnas = {
  new SimpleScriptScheme().tokenize(delegate)
}

/*
 * DSL: halantyam() closure - remove the last hal iT and return the modified String
 */
String.metaClass.halantyam = {
  ItRules itRules = ItRules.instance
  def varnas = delegate.varnas() as List
  if (itRules.hasHalantyam(delegate)) {
    varnas.remove(varnas.size()-1)
  }
  varnas.join()
}

/*
 * DSL: tasyaLopah() closure - remove all the it-markers from a pratyaya
 */
String.metaClass.tasyaLopah = {
  ItRules itRules = ItRules.instance
  itRules.tasyaLopah(delegate)
}

/*
 * DSL: Direct exposition of a pratyaya or a pratyahara!
 */
SivaSutra sivaSutra = SivaSutra.instance
sivaSutra.metaClass.getProperty = { String pratyahara ->
  def metaProperty = SivaSutra.metaClass.getMetaProperty(pratyahara)
  def result
  if(metaProperty) {
    //if there is an existing property invoke that
    result = metaProperty.getProperty(delegate)
  } else {
      //inspect the property and convert it to varnas
    //taparastatkaalasya rule; need to formulate in a better way
        if (pratyahara.endsWith('t.')) {
      result = (pratyahara - 't.').varnas()
    } else {
        result = sivaSutra.collect(pratyahara)
    }
  }
  result
}

void testSivaSutra() {
  SivaSutra sivaSutra = SivaSutra.instance

  sivaSutra.table.each { println it } //print the maheshvara sutrani
  println sivaSutra.list //print a flattened version of the maheshvara sutrani
  sivaSutra.each { println it } //another way to print flattened maheshvara sutrani
  println sivaSutra.itMarkers //print only the it markers

  assert sivaSutra.isIt('n.') //check if n. is an it marker

  assert sivaSutra.expand('ak') == ['a','e','u','N','r.','l.','k'] //expand pratyahara including the it
  assert ['a', 'e', 'u']== sivaSutra.collect('ak') //pratyahara excluding iT
  assert ['a', 'e', 'u']== sivaSutra.ak //another way of getting the pratyahara! Meta-programming in play!
}

void testItRules() {
  ItRules itRules = ItRules.instance

  println itRules.ajanunasika //prints all the ac anunasikas
  assert "lyut".varnas() == ['l', 'y', 'u', 't']}


void testHalantyamRule() {
  //print the pratyahara-s after the halantyam rule applied
  ["kt.va", "Gan.", "kt.vat.", "sap", "lyu-t", "saN", "sat.r."].each { println it + " = " + it.halantyam() }

  assert 'kt.va' == 'kt.va'.halantyam()
  assert 'kt.va'  == 'kt.vat.'.halantyam()
  assert 'Ga' == 'Gan.'.halantyam()
}

void testTasyaLopahRule() {
  ["Gan.", "kt.vat.", "sap", "lyu-t", "saN", "satr."].each { println it + " = " + it.tasyaLopah() }

  assert 'a' == 'Gan.'.tasyaLopah()
  assert 't.va' == 'kt.vat.'.tasyaLopah()
}

void testSamjnaSutras() {
  SivaSutra sivaSutra = SivaSutra.instance
  
  def vruddhi = sivaSutra.'At.' + sivaSutra.ic
  def guna = sivaSutra.'at.' + sivaSutra.'E.n'

  assert ['A', 'i', 'O.'] == vruddhi
  assert ['a', 'E.', 'o'] == guna
}


testSivaSutra()
testItRules()
testHalantyamRule()
testTasyaLopahRule()
testSamjnaSutras()

[Listing 4: SimpleScriptScheme.groovy]
package ch8.schemes

/**
 * A simple script tokenizer
 * 
 * @author vsrinivasan
 */
class SimpleScriptScheme implements ScriptScheme {

  // hyphen denotes anunasika
  static List NotationMarkers = ['.', ':', '-']

  /** 
   * split/tokenize a given word into a list of varnas
   * the word could be a pada, shabda, pratyaya or pratyahara
   * needs to handle anunasika properly
   * 
   * @calledby String.metaClass.varnas()
   * @param word
   * @return list of varnas
   */
  @Override
  public List tokenize(String word) {
    def varnas = []
    word.eachWithIndex { c, i ->
      c = ((i < word.length()-1) ? ((word[i+1] in NotationMarkers) ? (c + word[i+1]) : c) : c)
      if (!(c in NotationMarkers)) varnas << c
    }

    varnas
  }
}


Now some observations and analysis:

  1. To do this in a regular Java/C# would require several objects, wrapper-classes and utility methods to be created. But using meta programming techniques and defining a clean DSL makes this a very interesting implementation.
  2. Ability to work directly on strings, lists and maps makes a huge difference, as opposed to wrappers around strings and creating objects like pratyahara, it, pratyaya etc.
  3. The Main.groovy is self-explanatory in what's given and what's expected. This is not pseudo-code anymore! Note the direct method invocation like varnas(), halantyam(), tasyaLopah() on Strings. And also observe the direct reference to a pratyaya (sivaSutra.ac will expand to a list of vowels). Metaprogramming, awesome or what?
  4. Also observe the testSamjnaSutras() definitions. The only reason I have to quote the properties is due to the usage of dot in the schema. A symbol-less scheme like AST would make a very readable code.
  5. The code uses the SimpleScript for devanagari transliteration. As I had mentioned in a previous post, parsing the script is trivial, because of a strict 1:1 mapping between English and Sanskritam letters. Took less than 5 minutes to write it.
  6. However the code allows to use any transliteration scheme, if one can come up with it, by implementing the ScriptScheme interface. Harvard-Kyoto, ITRANS or AST or even Unicode - as long as the individual varna-s are correctly tokenized, the program will work fine.
  7. Any script scheme can be supplied via a groovy configuration and read by ConfigSlurper!
Obviously this is just the very beginning and some areas are still unpolished. But imagine being able to write code like

assert "bhavati" = bhU + sap + tin //1st gana
assert "kasca" == 'ka:' + sca //scutva sandhi

Imagine being able to work out sandhis just by using the plus sign! (eg ) Wouldn't that be really really cool? And that's not really impossible. It will only take a little more effort to expand the DSL to include anga, guna, operator overriding for sandhi rules etc.!

Imagine similar DSL-s can be implemented for parsing shlokas to determine chandas! The potential for a Samskritam DSL is huge.

Tuesday, November 15, 2011

Read, Restore and so forth

My first sight of a computer was in 1983 in a remote town in India, the deity of the city is a representation of "Conscious-Ethereal Grand Cosmic Nothingness". Our science teacher somehow got hold of somebody who had a Commodore 64. About 40 students from our class (India was that less populous 30 years ago) walked about 5 kilometers on a rainy day to that computer guy's house. We were allowed in a batch of 10 into a room dimly lit and were seated on the floor. A girl, sitting on the chair, was holding a joystick (or a mouse?) and a keyboard and making a noisy typing sound. On the small monitor some rectangles and squares of different colors were jumping around. She was playing some game. She said something about BASIC and thats all we learnt.

Almost 10 years forward. It was the onset of the Russian winter, I was walking with a senior towards the university. He was a smart guy, everybody respected him and was always an A-grader.  We were talking about programming language theories. C++ was just getting popular. He said "Hey, I know Pascal and C. And this year we are learning some AI using Prolog. I've also been learning C++...". He paused. Then suddenly said, "You know BASIC right? Can you teach me that?". I didn't know what to respond, but just said "Sure". I was a bit confused but elated to 'teach' a senior. That opportunity never came though.

Current times. Studying Ashtadhyayi's several techniques which are an illuminating parallel to programming - there is one that is intriguing. It is the word "aadi" given in a context. When Panini wants to mention a group of information, he would just use the first value of the group and suffix it with "aadi" or "aadya:". The reader is obviously either expected to know the list by-heart or refer to it. No big deal, when the average Sanskrit student is expected to know amarakosha by-heart anyway. So the first value of the list itself is used as the "head" to reference the list. This way Panini feeds by a pointer to an array of data using a very simple technique.

A pseudo code may clarify:

/* The list of verbs called as dhaatu paatha */
static Map DHAATU_PAATHA = [bhU:sattaayaam, ... ]

/* pointer to the list of the dhaatu paatha; trying to mimick naturalness - intentionally not referring via the static variable but via the head-value of the list */
char *list_of_verbs = ["bhu"]

Look at some of the sutra-s -

bhUvAdayo dhaatavaH (1.3.1) | By this statement Panini refers to about 2000+ verbal roots in Sanskritam, starting with bhU
sanaadyantaa dhaatavaH (3.1.32) | Refers to the list of derivational roots, the list starts with a verb that ends with suffix "-san"
praadayaH | Refers to the 22 prepositions that start with "pra"

Obviously this technique of "aadi" reference is pretty common in Sanskritam and other Indian literature. Tyagaraja in his siddharanjani kriti naadatanumanisham says "sadyojaataadi pancha-vaktra" referring to the five faces of Shiva starting from sadyoja. Obviously one who is not aware of the details will not know what the rest are, but aadi is just what it is - a pointer to a list of information. If Panini was the one who invented it (lets assume for sake of argument, because Panini had predecessor grammarians too and there were obviously other literature before him), it is a brilliant technique. The technique is not perfect though, because overtime somebody could come up with a modified list with the head-value being the same. But still its a great way to abstract information where the uniqueness of the head-value serves as an emphasised indicator to the contents following it.

Back to programming after the detour. Even after several years in programming, BASIC continues to fascinate me. Given all kinds of high level languages, there is one feature I think I sorely miss from BASIC. It is the "READ...DATA" statement. The READ...DATA statement allows for feeding data to the program in the shortest possible way without having to assign random values individually.

10 FOR I = 1 TO 10: READ X(I): NEXT I
20 DATA 1,3,5,7,11,13,17,19,23,29
30 RESTORE 20

10 READ NAME$, PHONE$, PI, BASERADIX
20 DATA "James Bond", "555-1212", 3.14, 8
30 DATA "11/11/2011", "All the world's a Pre-Production."
50 READ DATE$, WS_QUOTE$

The DATA statement could be anywhere in the program and the READ statement would sequentially read-off the data, like popping off a stack. The RESTORE statement acts like just like the "aadi" of ashtadhyayi - it points to just the beginning of the data. The simplicity of the bootstrap data feed is appreciated when you do not care where the DATA is set. Several high level languages have been invented after that, but not many provide such an easy way to feed bootstrap data to the program variables. Of course there is enumerators and similar stuff, but somehow the simplicity of READ statement stands out. Just like Panini's aadi technique.