The chapter 8 dsl project is now available at github at https://github.com/vasya10/samskritam. It is just in very early stages and I will be updating whenever I get some free time.
Tuesday, November 29, 2011
Thursday, November 17, 2011
The Chapter 8 DSL
Domain Specific Language is fast becoming a popular way to describe a problem or a solution for a specific domain. The quality and readability of code using DSL is magnitudes above the regular "technical" code (using Java/C# for eg). Since information about DSL can be googled amply, I am not going to spend time writing on what a DSL is.
In many of the previous posts, I had used pseudo-code, demonstrating parallels in programming to Panini's techniques. Time to call the bluff now. Presented below is a seriously tested code. Here is a DSL that closely models some basic techniques of ashtaadhyaayI, specifically the maheshvara-sutra-s and those darning "it" rules. I'm using Groovy for the implementation, as I feel that it's syntax is more natural to read than that of Scala or Ruby.
Let's define some classes.
In many of the previous posts, I had used pseudo-code, demonstrating parallels in programming to Panini's techniques. Time to call the bluff now. Presented below is a seriously tested code. Here is a DSL that closely models some basic techniques of ashtaadhyaayI, specifically the maheshvara-sutra-s and those darning "it" rules. I'm using Groovy for the implementation, as I feel that it's syntax is more natural to read than that of Scala or Ruby.
Let's define some classes.
[Listing 1: SivaSutra.groovy]
package ch8 import java.util.List /** * Implementation of Maheshvara Sutra using SimpleScript transliteration scheme * The table itself can be moved to a groovy configuration file to allow a different scheme like HK, ITRANS or AST * * @author vsrinivasan */ @Singleton class SivaSutra { //siva-sutraani List table = [ ['a', 'e', 'u', 'N'], ['r.', 'l.', 'k'], ['E.', 'o', 'n'], ['i', 'O.', 'c'], ['h', 'y', 'v', 'r', 't'], ['l', 'N'], ['n.', 'm', 'n', 'N', 'N.', 'm'], ['J', 'B', 'n.'], ['G', 'D', 'D.', 's.'], ['j', 'b', 'g', 'd', 'd.', 's'], ['K', 'P', 'C', 'T', 'T.', 'c', 't', 't.', 'v'], ['k', 'p', 'y'], ['s', 's.', 'S', 'r'], ['h', 'l'] ] List list = table.flatten() int indexOf(String varna) { list.indexOf(varna) } @Override Iterator iterator() { list.iterator() } //eShaam antyaaH it List itMarkers = table.collect { it.last() } /** * is this iT-marker? * this finds only 'pratyahara iT' is defined, for other it-s see ItRules.groovy * * @see ItRules */ boolean isIt(f) { itMarkers.contains(f) } /** * expands a given pratyahara, including all the iT-s * not for practical purposes, but good for testing * * @param pratyahara * @return */ List expand(String pratyahara) { def (begin, end) = pratyahara.varnas() list[begin..end] } /** * returns the real pratyahara varna-s, excluding the intermediate it-markers * very procedural implementation, need to make it groovy-like * * @param pratyahara * @return */ List collect(String pratyahara) { def (begin, end) = pratyahara.varnas() boolean start = false def result = [] table.each { line -> line.each { item -> if (item == begin || start) { if (item != line.last()) { result << item start = true } if (item == end && item == line.last()) { start = false } } } } return result } }
[Listing 2: ItRules.groovy]
package ch8 @Singleton class ItRules { //#(1.3.2) upadeshe ajanunaasika iT, anunAsika-s are denoted by a "-" at the end, // may be M would be a better option? def ajanunasika = 'aAeEuUr.R.l.E.IOO.'.varnas().collect { it + "-" } //cutU def cu = 'cCjJn.'.varnas() def tu = 'tTdDN'.varnas() //s.asca, denoting as "sha" for convenience def sha = ['s.'] //lasaku (ataddhite) def ku = 'kKgGn'.varnas() def lasaku = 'ls'.varnas() + ku //some more to be defined //#(1.3.3) halantyam - check if the last char is hal SivaSutra sivaSutra = SivaSutra.instance boolean hasHalantyam(String pratyaya) { pratyaya.varnas().last() in sivaSutra.hl } //allItMarkers except hal, which is applicable only to last letter def allItMarkers = ajanunasika + cu + tu + lashaku boolean isAnunasika(String varna) { varna.endsWith('-') } boolean isItMarker(String varna) { varna in allItMarkers } String tasyaLopah(String pratyaya) { (pratyaya.halantyam().varnas() - allItMarkers).join() } }
[Listing 3: Main.groovy]
package ch8.tests import java.util.List import ch8.ItRules import ch8.SivaSutra import ch8.schemes.SimpleScriptScheme import ch8.Samjna /* * DSL: varnas() closure - tokenize the script into individual varnas (list) */ String.metaClass.varnas = { new SimpleScriptScheme().tokenize(delegate) } /* * DSL: halantyam() closure - remove the last hal iT and return the modified String */ String.metaClass.halantyam = { ItRules itRules = ItRules.instance def varnas = delegate.varnas() as List if (itRules.hasHalantyam(delegate)) { varnas.remove(varnas.size()-1) } varnas.join() } /* * DSL: tasyaLopah() closure - remove all the it-markers from a pratyaya */ String.metaClass.tasyaLopah = { ItRules itRules = ItRules.instance itRules.tasyaLopah(delegate) } /* * DSL: Direct exposition of a pratyaya or a pratyahara! */ SivaSutra sivaSutra = SivaSutra.instance sivaSutra.metaClass.getProperty = { String pratyahara -> def metaProperty = SivaSutra.metaClass.getMetaProperty(pratyahara) def result if(metaProperty) { //if there is an existing property invoke that result = metaProperty.getProperty(delegate) } else { //inspect the property and convert it to varnas //taparastatkaalasya rule; need to formulate in a better way if (pratyahara.endsWith('t.')) { result = (pratyahara - 't.').varnas() } else { result = sivaSutra.collect(pratyahara) } } result } void testSivaSutra() { SivaSutra sivaSutra = SivaSutra.instance sivaSutra.table.each { println it } //print the maheshvara sutrani println sivaSutra.list //print a flattened version of the maheshvara sutrani sivaSutra.each { println it } //another way to print flattened maheshvara sutrani println sivaSutra.itMarkers //print only the it markers assert sivaSutra.isIt('n.') //check if n. is an it marker assert sivaSutra.expand('ak') == ['a','e','u','N','r.','l.','k'] //expand pratyahara including the it assert ['a', 'e', 'u']== sivaSutra.collect('ak') //pratyahara excluding iT assert ['a', 'e', 'u']== sivaSutra.ak //another way of getting the pratyahara! Meta-programming in play! } void testItRules() { ItRules itRules = ItRules.instance println itRules.ajanunasika //prints all the ac anunasikas assert "lyut".varnas() == ['l', 'y', 'u', 't']} void testHalantyamRule() { //print the pratyahara-s after the halantyam rule applied ["kt.va", "Gan.", "kt.vat.", "sap", "lyu-t", "saN", "sat.r."].each { println it + " = " + it.halantyam() } assert 'kt.va' == 'kt.va'.halantyam() assert 'kt.va' == 'kt.vat.'.halantyam() assert 'Ga' == 'Gan.'.halantyam() } void testTasyaLopahRule() { ["Gan.", "kt.vat.", "sap", "lyu-t", "saN", "satr."].each { println it + " = " + it.tasyaLopah() } assert 'a' == 'Gan.'.tasyaLopah() assert 't.va' == 'kt.vat.'.tasyaLopah() } void testSamjnaSutras() { SivaSutra sivaSutra = SivaSutra.instance def vruddhi = sivaSutra.'At.' + sivaSutra.ic def guna = sivaSutra.'at.' + sivaSutra.'E.n' assert ['A', 'i', 'O.'] == vruddhi assert ['a', 'E.', 'o'] == guna } testSivaSutra() testItRules() testHalantyamRule() testTasyaLopahRule() testSamjnaSutras()
[Listing 4: SimpleScriptScheme.groovy]
package ch8.schemes /** * A simple script tokenizer * * @author vsrinivasan */ class SimpleScriptScheme implements ScriptScheme { // hyphen denotes anunasika static List NotationMarkers = ['.', ':', '-'] /** * split/tokenize a given word into a list of varnas * the word could be a pada, shabda, pratyaya or pratyahara * needs to handle anunasika properly * * @calledby String.metaClass.varnas() * @param word * @return list of varnas */ @Override public List tokenize(String word) { def varnas = [] word.eachWithIndex { c, i -> c = ((i < word.length()-1) ? ((word[i+1] in NotationMarkers) ? (c + word[i+1]) : c) : c) if (!(c in NotationMarkers)) varnas << c } varnas } }
Now some observations and analysis:
- To do this in a regular Java/C# would require several objects, wrapper-classes and utility methods to be created. But using meta programming techniques and defining a clean DSL makes this a very interesting implementation.
- Ability to work directly on strings, lists and maps makes a huge difference, as opposed to wrappers around strings and creating objects like pratyahara, it, pratyaya etc.
- The Main.groovy is self-explanatory in what's given and what's expected. This is not pseudo-code anymore! Note the direct method invocation like varnas(), halantyam(), tasyaLopah() on Strings. And also observe the direct reference to a pratyaya (sivaSutra.ac will expand to a list of vowels). Metaprogramming, awesome or what?
- Also observe the testSamjnaSutras() definitions. The only reason I have to quote the properties is due to the usage of dot in the schema. A symbol-less scheme like AST would make a very readable code.
- The code uses the SimpleScript for devanagari transliteration. As I had mentioned in a previous post, parsing the script is trivial, because of a strict 1:1 mapping between English and Sanskritam letters. Took less than 5 minutes to write it.
- However the code allows to use any transliteration scheme, if one can come up with it, by implementing the ScriptScheme interface. Harvard-Kyoto, ITRANS or AST or even Unicode - as long as the individual varna-s are correctly tokenized, the program will work fine.
- Any script scheme can be supplied via a groovy configuration and read by ConfigSlurper!
assert "bhavati" = bhU + sap + tin //1st gana
assert "kasca" == 'ka:' + sca //scutva sandhi
Imagine being able to work out sandhis just by using the plus sign! (eg ) Wouldn't that be really really cool? And that's not really impossible. It will only take a little more effort to expand the DSL to include anga, guna, operator overriding for sandhi rules etc.!
Imagine similar DSL-s can be implemented for parsing shlokas to determine chandas! The potential for a Samskritam DSL is huge.
Labels:
ashtadhyayi,
chapter8,
dsl,
groovy,
panini,
samskritam
Tuesday, November 15, 2011
Read, Restore and so forth
My first sight of a computer was in 1983 in a remote town in India, the deity of the city is a representation of "Conscious-Ethereal Grand Cosmic Nothingness". Our science teacher somehow got hold of somebody who had a Commodore 64. About 40 students from our class (India was that less populous 30 years ago) walked about 5 kilometers on a rainy day to that computer guy's house. We were allowed in a batch of 10 into a room dimly lit and were seated on the floor. A girl, sitting on the chair, was holding a joystick (or a mouse?) and a keyboard and making a noisy typing sound. On the small monitor some rectangles and squares of different colors were jumping around. She was playing some game. She said something about BASIC and thats all we learnt.
Almost 10 years forward. It was the onset of the Russian winter, I was walking with a senior towards the university. He was a smart guy, everybody respected him and was always an A-grader. We were talking about programming language theories. C++ was just getting popular. He said "Hey, I know Pascal and C. And this year we are learning some AI using Prolog. I've also been learning C++...". He paused. Then suddenly said, "You know BASIC right? Can you teach me that?". I didn't know what to respond, but just said "Sure". I was a bit confused but elated to 'teach' a senior. That opportunity never came though.
Current times. Studying Ashtadhyayi's several techniques which are an illuminating parallel to programming - there is one that is intriguing. It is the word "aadi" given in a context. When Panini wants to mention a group of information, he would just use the first value of the group and suffix it with "aadi" or "aadya:". The reader is obviously either expected to know the list by-heart or refer to it. No big deal, when the average Sanskrit student is expected to know amarakosha by-heart anyway. So the first value of the list itself is used as the "head" to reference the list. This way Panini feeds by a pointer to an array of data using a very simple technique.
A pseudo code may clarify:
/* The list of verbs called as dhaatu paatha */
static Map DHAATU_PAATHA = [bhU:sattaayaam, ... ]
/* pointer to the list of the dhaatu paatha; trying to mimick naturalness - intentionally not referring via the static variable but via the head-value of the list */
char *list_of_verbs = ["bhu"]
Look at some of the sutra-s -
bhUvAdayo dhaatavaH (1.3.1) | By this statement Panini refers to about 2000+ verbal roots in Sanskritam, starting with bhU
sanaadyantaa dhaatavaH (3.1.32) | Refers to the list of derivational roots, the list starts with a verb that ends with suffix "-san"
praadayaH | Refers to the 22 prepositions that start with "pra"
Obviously this technique of "aadi" reference is pretty common in Sanskritam and other Indian literature. Tyagaraja in his siddharanjani kriti naadatanumanisham says "sadyojaataadi pancha-vaktra" referring to the five faces of Shiva starting from sadyoja. Obviously one who is not aware of the details will not know what the rest are, but aadi is just what it is - a pointer to a list of information. If Panini was the one who invented it (lets assume for sake of argument, because Panini had predecessor grammarians too and there were obviously other literature before him), it is a brilliant technique. The technique is not perfect though, because overtime somebody could come up with a modified list with the head-value being the same. But still its a great way to abstract information where the uniqueness of the head-value serves as an emphasised indicator to the contents following it.
Back to programming after the detour. Even after several years in programming, BASIC continues to fascinate me. Given all kinds of high level languages, there is one feature I think I sorely miss from BASIC. It is the "READ...DATA" statement. The READ...DATA statement allows for feeding data to the program in the shortest possible way without having to assign random values individually.
10 FOR I = 1 TO 10: READ X(I): NEXT I
20 DATA 1,3,5,7,11,13,17,19,23,29
30 RESTORE 20
10 READ NAME$, PHONE$, PI, BASERADIX
20 DATA "James Bond", "555-1212", 3.14, 8
30 DATA "11/11/2011", "All the world's a Pre-Production."
50 READ DATE$, WS_QUOTE$
The DATA statement could be anywhere in the program and the READ statement would sequentially read-off the data, like popping off a stack. The RESTORE statement acts like just like the "aadi" of ashtadhyayi - it points to just the beginning of the data. The simplicity of the bootstrap data feed is appreciated when you do not care where the DATA is set. Several high level languages have been invented after that, but not many provide such an easy way to feed bootstrap data to the program variables. Of course there is enumerators and similar stuff, but somehow the simplicity of READ statement stands out. Just like Panini's aadi technique.
Almost 10 years forward. It was the onset of the Russian winter, I was walking with a senior towards the university. He was a smart guy, everybody respected him and was always an A-grader. We were talking about programming language theories. C++ was just getting popular. He said "Hey, I know Pascal and C. And this year we are learning some AI using Prolog. I've also been learning C++...". He paused. Then suddenly said, "You know BASIC right? Can you teach me that?". I didn't know what to respond, but just said "Sure". I was a bit confused but elated to 'teach' a senior. That opportunity never came though.
Current times. Studying Ashtadhyayi's several techniques which are an illuminating parallel to programming - there is one that is intriguing. It is the word "aadi" given in a context. When Panini wants to mention a group of information, he would just use the first value of the group and suffix it with "aadi" or "aadya:". The reader is obviously either expected to know the list by-heart or refer to it. No big deal, when the average Sanskrit student is expected to know amarakosha by-heart anyway. So the first value of the list itself is used as the "head" to reference the list. This way Panini feeds by a pointer to an array of data using a very simple technique.
A pseudo code may clarify:
/* The list of verbs called as dhaatu paatha */
static Map DHAATU_PAATHA = [bhU:sattaayaam, ... ]
/* pointer to the list of the dhaatu paatha; trying to mimick naturalness - intentionally not referring via the static variable but via the head-value of the list */
char *list_of_verbs = ["bhu"]
Look at some of the sutra-s -
bhUvAdayo dhaatavaH (1.3.1) | By this statement Panini refers to about 2000+ verbal roots in Sanskritam, starting with bhU
sanaadyantaa dhaatavaH (3.1.32) | Refers to the list of derivational roots, the list starts with a verb that ends with suffix "-san"
praadayaH | Refers to the 22 prepositions that start with "pra"
Obviously this technique of "aadi" reference is pretty common in Sanskritam and other Indian literature. Tyagaraja in his siddharanjani kriti naadatanumanisham says "sadyojaataadi pancha-vaktra" referring to the five faces of Shiva starting from sadyoja. Obviously one who is not aware of the details will not know what the rest are, but aadi is just what it is - a pointer to a list of information. If Panini was the one who invented it (lets assume for sake of argument, because Panini had predecessor grammarians too and there were obviously other literature before him), it is a brilliant technique. The technique is not perfect though, because overtime somebody could come up with a modified list with the head-value being the same. But still its a great way to abstract information where the uniqueness of the head-value serves as an emphasised indicator to the contents following it.
Back to programming after the detour. Even after several years in programming, BASIC continues to fascinate me. Given all kinds of high level languages, there is one feature I think I sorely miss from BASIC. It is the "READ...DATA" statement. The READ...DATA statement allows for feeding data to the program in the shortest possible way without having to assign random values individually.
10 FOR I = 1 TO 10: READ X(I): NEXT I
20 DATA 1,3,5,7,11,13,17,19,23,29
30 RESTORE 20
10 READ NAME$, PHONE$, PI, BASERADIX
20 DATA "James Bond", "555-1212", 3.14, 8
30 DATA "11/11/2011", "All the world's a Pre-Production."
50 READ DATE$, WS_QUOTE$
The DATA statement could be anywhere in the program and the READ statement would sequentially read-off the data, like popping off a stack. The RESTORE statement acts like just like the "aadi" of ashtadhyayi - it points to just the beginning of the data. The simplicity of the bootstrap data feed is appreciated when you do not care where the DATA is set. Several high level languages have been invented after that, but not many provide such an easy way to feed bootstrap data to the program variables. Of course there is enumerators and similar stuff, but somehow the simplicity of READ statement stands out. Just like Panini's aadi technique.
Labels:
ashtadhyayi,
basic,
panini,
programming,
samskritam,
sanskrit
Subscribe to:
Posts (Atom)