3/04/2006

 

Suggesting a new Language for Inventing Domain Specific Language

Inspired by Roy's recent speech given to TWU&TWI and Vincent's nice introduction to Smalltalk, I am thinking about the possibility of inventing a language for creating DSL more easily. Inventing a DSL (External DSL in Martin’s definition) is not an easy job. We have to write the grammar in EBNF, and write compiler or interpreter for it. It takes a long time and major efforts to see it works. So, the decision of whether to make a DSL or not very carefully, it may not worth the efforts. And after it is born, changing the syntax or grammar is another big problem. Thinking about the process of making a DSL, we can find we were given too many options there. Writing a compiler from scratch, we have thousands of choices of the grammar, syntax, semantic of the language. Do we need so many flexibilities, and leaving out so many basic and nice language elements (like OOP or GC)? Why we need a new language to cover specific domain? Maybe it can be categorized into two reasons: 1. Improving Productivity (Better Encapsulation, and Reuse) 2. Improving Expressiveness (We can communicate with the clients in a better way) For the first reason, I don’t think DSL can be very useful in this case. The art of organization of codes is a hot researching area. But I didn’t see any new technology improved the productivity a lot recently after the birth of OOP. I believe OOP is a reliable technology to build large business application in a long time, especially complemented with Agile. For the second reason, I found without inventing a new language from scratch, we still can achieve the same goal, that is build DSL on top of an extremely flexible language (Internal DSL in Martin’s Definition). Assuming the main reason to invent a DSL is Improving Expressiveness. What struck the expressiveness of code? I think it is because: 1. The concepts employed in code didn’t fit very well in real world. 2. The grammar or syntax of programming language makes the code looks weird. For the first problem, we can solve it using current technologies such as Object Oriented Programming, Domain Driven Design… For the second problem, we have few choices when we are using languages like Java or C#. If using Ruby, because language provided a bunch of nice features, we are able to do some clever design to make the code looks better, but we are still so limited. If using Lisp or Smalltalk, we are given maximum flexibility to do nice things. The features of Smalltalk which makes its code looks so natural are: 1. Minimal built-in keywords, few things are special (Even if/else, while are implemented using OOP or Recursion) 2. More straightforward grammar (No commas, braces, curly braces) 3. Key Message Given if/else as an example, “ifTrue” is just a message of class True and class False. The difference between the two implementations is one is executing the following block, one just ignore it. So we can introduce new control structures and other things used to be implemented at level of language easily. Key Message is another cool feature. In Java, we can only write: text.addAttributes(attr, start, stop). But In Smalltalk, we can given the message addAttribute a better name involving the information of parameters in it: text addAttribute: attr from: start to: stop the signature of the message consists three parts: addAttribute from to. So the code will read more like English. But Smalltalk is not good enough: 1. Still have some symbols for grammar like [:] [^]. 2. The arrangement sentence element is limited by order “object message”. 3. Can not use space in naming. 4. Can not involve left side of = into sentence nicely (we can not write Create new person, assign it to Michael). So I am suggesting a new language with creating new DSL more easily as its only purpose. It starts from good job done by Smalltalk, and improve it further by solving above problems. The final target is allowing the code reads like English, although writing it may still need much more careful design naming and coding comparing with writing casually in English. The initial thoughts are listed below: 1. Use XML to structure the source code (actually is the abstract syntax tree), programme against GUI representation of the source code instead of writing in text file directly. (So I don’t need to invent any fancy grammar to keep the balance of exactness and expressiveness. Let the XML source file to handle the problem of exactness by documenting parse tree, while maintaining high expressiveness through IDE) 2. Decouple order from invocation. (By introducing the above technique, now we can specify message sent to which object through underlying XML representation. Then it is not necessary to force object followed by the message sent to it. Finally we can say “update window” instead of “window update”) 3. Out parameter. (x = y calcSomething; becomes y calcSomething assign to x. Pass the variable you want to store the return value as special message parameter to object. Then the semantic of “=” can be shown). 4. Then message becomes the skeleton of sentence, we can fill the object we want to operate on, the argument of the operation, and the result variable into the skeleton to form a part of sentence. (move … to … , … is a message, we can say “move pointA to 1 , 2”. The object we are operating on is pointA, the arguments are 1 and 2) I think creating DSL this way is the most economical way. Either writing new compiler or trying to use MDA to save efforts spent on writing compiler will cost a lot of time money and rework. But building DSL upon a flexible OOP language, after some interfacing job, DSL is just a natural extending to the Domain Model we have done today. BTW: Is “SmallRocks!” a nice name? :)

Comments:
Hmmm.... Why XML? Why not use a Lisp variant. S-expressions can express the same structures as XML trees but are cleaner and you get evaluation semantics for free. Scheme would be the obvious choice because it is a small language with formally defined semantics and there are lots of implementations that run on different platforms or virtual machines.
 
You should be clear about whether you mean abstract syntax tree or concrete syntax tree.
 
The underlying representation is not important. My point is seperating the source code from what you see in IDE to avoid any weird programming notion in visual. Using S-expression is an option, but I will only use it as a way to structure codes. I prefer OOP to FP, so my start point is Smalltalk instead of Scheme.
I don't know exactly the difference between ast and parse tree (sorry...). But I think this problem only matters in language like Java. Because the syntax of pure oo language is very very simple. Every part of program will finally be reduced into message passing. I found following material from wikipedia:
"An AST differs from a parse tree by omitting nodes and edges for syntax rules that do not affect the semantics of the program. The classic example of such an omission is grouping parentheses, since in an AST the grouping of operands is explicit in the tree structure." - from http://en.wikipedia.org/wiki/Abstract_syntax_tree
So, I think concrete syntax tree is more exactly. Thank you for your comment.
 
Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]