Thursday, November 17, 2011

The Chapter 8 DSL

Domain Specific Language is fast becoming a popular way to describe a problem or a solution for a specific domain. The quality and readability of code using DSL is magnitudes above the regular "technical" code (using Java/C# for eg). Since information about DSL can be googled amply, I am not going to spend time writing on what a DSL is.

In many of the previous posts, I had used pseudo-code, demonstrating parallels in programming to Panini's techniques. Time to call the bluff now. Presented below is a seriously tested code. Here is a DSL that closely models some basic techniques of ashtaadhyaayI, specifically the maheshvara-sutra-s and those darning "it" rules. I'm using Groovy for the implementation, as I feel that it's syntax is more natural to read than that of Scala or Ruby.

Let's define some classes.

[Listing 1: SivaSutra.groovy]
package ch8

import java.util.List

/**
 * Implementation of Maheshvara Sutra using SimpleScript transliteration scheme
 * The table itself can be moved to a groovy configuration file to allow a different scheme like HK, ITRANS or AST
 * 
 * @author vsrinivasan
 */
@Singleton
class SivaSutra  {

  //siva-sutraani
  List table =
  [
    ['a', 'e', 'u', 'N'],
    ['r.', 'l.', 'k'],
    ['E.', 'o', 'n'],
    ['i', 'O.', 'c'],
    ['h', 'y', 'v', 'r', 't'],
    ['l', 'N'],
    ['n.', 'm', 'n', 'N', 'N.', 'm'],
    ['J', 'B', 'n.'],
    ['G', 'D', 'D.', 's.'],
    ['j', 'b', 'g', 'd', 'd.', 's'],
    ['K', 'P', 'C', 'T', 'T.', 'c', 't', 't.', 'v'],
    ['k', 'p', 'y'],
    ['s', 's.', 'S', 'r'],
    ['h', 'l']
  ]

  List list = table.flatten()

  int indexOf(String varna) { list.indexOf(varna) }

  @Override
  Iterator iterator() { list.iterator() }

  //eShaam antyaaH it
  List itMarkers = table.collect { it.last() }

  /**
   * is this iT-marker?
   * this finds only 'pratyahara iT' is defined, for other it-s see ItRules.groovy
   * 
   * @see ItRules
   */
  boolean isIt(f) { itMarkers.contains(f) }

  /**
   * expands a given pratyahara, including all the iT-s
   * not for practical purposes, but good for testing
   * 
   * @param pratyahara
   * @return
   */
  List expand(String pratyahara) {
    def (begin, end) = pratyahara.varnas()
    list[begin..end]
  }

  /**
   * returns the real pratyahara varna-s, excluding the intermediate it-markers
   * very procedural implementation, need to make it groovy-like
   * 
   * @param pratyahara
   * @return
   */
  List collect(String pratyahara) {
    def (begin, end) = pratyahara.varnas()
    boolean start = false
    def result = []

    table.each { line ->
      line.each { item ->
        if (item == begin || start) {
          if (item != line.last()) {
            result << item
            start = true
          }
          if (item == end && item == line.last()) {
            start = false
          }
        }
      }
    }
    return result
  }
}


[Listing 2: ItRules.groovy]
package ch8

@Singleton
class ItRules {

  //#(1.3.2) upadeshe ajanunaasika iT, anunAsika-s are denoted by a "-" at the end,
  //  may be M would be a better option?
  def ajanunasika = 'aAeEuUr.R.l.E.IOO.'.varnas().collect { it + "-" }

  //cutU
  def cu = 'cCjJn.'.varnas()
  def tu = 'tTdDN'.varnas()

  //s.asca, denoting as "sha" for convenience
  def sha = ['s.']

  //lasaku (ataddhite)
  def ku = 'kKgGn'.varnas()
  def lasaku = 'ls'.varnas() + ku

  //some more to be defined

  //#(1.3.3) halantyam - check if the last char is hal
  SivaSutra sivaSutra = SivaSutra.instance
  boolean hasHalantyam(String pratyaya) { pratyaya.varnas().last() in sivaSutra.hl }

  //allItMarkers except hal, which is applicable only to last letter
  def allItMarkers = ajanunasika + cu + tu + lashaku

  boolean isAnunasika(String varna) { varna.endsWith('-') }

  boolean isItMarker(String varna) { varna in allItMarkers }

  String tasyaLopah(String pratyaya) { (pratyaya.halantyam().varnas() - allItMarkers).join() }
}

[Listing 3: Main.groovy]
package ch8.tests

import java.util.List

import ch8.ItRules
import ch8.SivaSutra
import ch8.schemes.SimpleScriptScheme
import ch8.Samjna

/*
 * DSL: varnas() closure - tokenize the script into individual varnas (list)
 */
String.metaClass.varnas = {
  new SimpleScriptScheme().tokenize(delegate)
}

/*
 * DSL: halantyam() closure - remove the last hal iT and return the modified String
 */
String.metaClass.halantyam = {
  ItRules itRules = ItRules.instance
  def varnas = delegate.varnas() as List
  if (itRules.hasHalantyam(delegate)) {
    varnas.remove(varnas.size()-1)
  }
  varnas.join()
}

/*
 * DSL: tasyaLopah() closure - remove all the it-markers from a pratyaya
 */
String.metaClass.tasyaLopah = {
  ItRules itRules = ItRules.instance
  itRules.tasyaLopah(delegate)
}

/*
 * DSL: Direct exposition of a pratyaya or a pratyahara!
 */
SivaSutra sivaSutra = SivaSutra.instance
sivaSutra.metaClass.getProperty = { String pratyahara ->
  def metaProperty = SivaSutra.metaClass.getMetaProperty(pratyahara)
  def result
  if(metaProperty) {
    //if there is an existing property invoke that
    result = metaProperty.getProperty(delegate)
  } else {
      //inspect the property and convert it to varnas
    //taparastatkaalasya rule; need to formulate in a better way
        if (pratyahara.endsWith('t.')) {
      result = (pratyahara - 't.').varnas()
    } else {
        result = sivaSutra.collect(pratyahara)
    }
  }
  result
}

void testSivaSutra() {
  SivaSutra sivaSutra = SivaSutra.instance

  sivaSutra.table.each { println it } //print the maheshvara sutrani
  println sivaSutra.list //print a flattened version of the maheshvara sutrani
  sivaSutra.each { println it } //another way to print flattened maheshvara sutrani
  println sivaSutra.itMarkers //print only the it markers

  assert sivaSutra.isIt('n.') //check if n. is an it marker

  assert sivaSutra.expand('ak') == ['a','e','u','N','r.','l.','k'] //expand pratyahara including the it
  assert ['a', 'e', 'u']== sivaSutra.collect('ak') //pratyahara excluding iT
  assert ['a', 'e', 'u']== sivaSutra.ak //another way of getting the pratyahara! Meta-programming in play!
}

void testItRules() {
  ItRules itRules = ItRules.instance

  println itRules.ajanunasika //prints all the ac anunasikas
  assert "lyut".varnas() == ['l', 'y', 'u', 't']}


void testHalantyamRule() {
  //print the pratyahara-s after the halantyam rule applied
  ["kt.va", "Gan.", "kt.vat.", "sap", "lyu-t", "saN", "sat.r."].each { println it + " = " + it.halantyam() }

  assert 'kt.va' == 'kt.va'.halantyam()
  assert 'kt.va'  == 'kt.vat.'.halantyam()
  assert 'Ga' == 'Gan.'.halantyam()
}

void testTasyaLopahRule() {
  ["Gan.", "kt.vat.", "sap", "lyu-t", "saN", "satr."].each { println it + " = " + it.tasyaLopah() }

  assert 'a' == 'Gan.'.tasyaLopah()
  assert 't.va' == 'kt.vat.'.tasyaLopah()
}

void testSamjnaSutras() {
  SivaSutra sivaSutra = SivaSutra.instance
  
  def vruddhi = sivaSutra.'At.' + sivaSutra.ic
  def guna = sivaSutra.'at.' + sivaSutra.'E.n'

  assert ['A', 'i', 'O.'] == vruddhi
  assert ['a', 'E.', 'o'] == guna
}


testSivaSutra()
testItRules()
testHalantyamRule()
testTasyaLopahRule()
testSamjnaSutras()

[Listing 4: SimpleScriptScheme.groovy]
package ch8.schemes

/**
 * A simple script tokenizer
 * 
 * @author vsrinivasan
 */
class SimpleScriptScheme implements ScriptScheme {

  // hyphen denotes anunasika
  static List NotationMarkers = ['.', ':', '-']

  /** 
   * split/tokenize a given word into a list of varnas
   * the word could be a pada, shabda, pratyaya or pratyahara
   * needs to handle anunasika properly
   * 
   * @calledby String.metaClass.varnas()
   * @param word
   * @return list of varnas
   */
  @Override
  public List tokenize(String word) {
    def varnas = []
    word.eachWithIndex { c, i ->
      c = ((i < word.length()-1) ? ((word[i+1] in NotationMarkers) ? (c + word[i+1]) : c) : c)
      if (!(c in NotationMarkers)) varnas << c
    }

    varnas
  }
}


Now some observations and analysis:

  1. To do this in a regular Java/C# would require several objects, wrapper-classes and utility methods to be created. But using meta programming techniques and defining a clean DSL makes this a very interesting implementation.
  2. Ability to work directly on strings, lists and maps makes a huge difference, as opposed to wrappers around strings and creating objects like pratyahara, it, pratyaya etc.
  3. The Main.groovy is self-explanatory in what's given and what's expected. This is not pseudo-code anymore! Note the direct method invocation like varnas(), halantyam(), tasyaLopah() on Strings. And also observe the direct reference to a pratyaya (sivaSutra.ac will expand to a list of vowels). Metaprogramming, awesome or what?
  4. Also observe the testSamjnaSutras() definitions. The only reason I have to quote the properties is due to the usage of dot in the schema. A symbol-less scheme like AST would make a very readable code.
  5. The code uses the SimpleScript for devanagari transliteration. As I had mentioned in a previous post, parsing the script is trivial, because of a strict 1:1 mapping between English and Sanskritam letters. Took less than 5 minutes to write it.
  6. However the code allows to use any transliteration scheme, if one can come up with it, by implementing the ScriptScheme interface. Harvard-Kyoto, ITRANS or AST or even Unicode - as long as the individual varna-s are correctly tokenized, the program will work fine.
  7. Any script scheme can be supplied via a groovy configuration and read by ConfigSlurper!
Obviously this is just the very beginning and some areas are still unpolished. But imagine being able to write code like

assert "bhavati" = bhU + sap + tin //1st gana
assert "kasca" == 'ka:' + sca //scutva sandhi

Imagine being able to work out sandhis just by using the plus sign! (eg ) Wouldn't that be really really cool? And that's not really impossible. It will only take a little more effort to expand the DSL to include anga, guna, operator overriding for sandhi rules etc.!

Imagine similar DSL-s can be implemented for parsing shlokas to determine chandas! The potential for a Samskritam DSL is huge.

4 comments:

vishvAs vAsuki said...

['a', 'e', 'u', 'N'] -> ['a', 'i', 'u', 'N'],

bhavitavyaM khalu?

vishvAs vAsuki said...

Oh avagataM - anyA lipiH upayujyate.

Vasu said...

@vishvas, aam anyaa lipiH asti | As you can see in the SimpleScriptScheme, it was just a one-liner to tokenize. Didnt want to spend time on tokenizing other schemes which someone have already done. Having said that, its not very difficult to substitute this with a standard scheme like HK, ITRANS or IAST.

ssriram said...

nUtanalipi - http://xyzcompany.in/2011/12/15/nutanalipi-new-script/
----------

OM

a A i I u U R RR L LL E Y O W M H

k k' g g' q
c c' j j' Q
T T' D D' N
t t' d d' n
p p' b b' m
y r l v
S S' s h
l; x

ka kA ki kI ku kU kR kRR kL kLL kE kY kO kW k[]M k[]H


akArO muk'aH sarvad'armANAm AdyanutpannatvAt ||

d'armO raxati raxitaH |

agnimILE purOhitaM yajQasya dEvam Rtvijam
hOtAraM ratnad'Atamam

sahasraSIrS'A puruS'aH sahasrAxaH sahasrapAt
sab'UmiM viSvatO vRtvAtyatiS'T'ad daSAqgulam
puruS'a EvEdaM sarvaM yad b'UtaM yacca b'avyam

at'AtO d'Atust'adOS'agatyavikArahEtub'UtArt'avArd'akadravyANyadyAt