<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-19829485</id><updated>2012-01-24T20:27:11.508+08:00</updated><title type='text'>taowen</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>28</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-19829485.post-3973231604975184632</id><published>2011-01-04T23:20:00.003+08:00</published><updated>2011-01-04T23:41:07.367+08:00</updated><title type='text'>Retrospection: the mistakes I have made these years</title><content type='html'>&lt;p&gt;Someone told me, it took more than 10000 hours repeated practices to make a professional mature. I am still far away from the standard, but after 5 years of programming as my profession, I realized I already made so many mistakes, that worth some conscious retrospection.   
&lt;/p&gt;
&lt;p&gt;
One presentation I did not watch but really liked their slides: http://www.infoq.com/presentations/LMAX. In the slides, they said:
&lt;/p&gt;
&lt;p&gt;
On a single thread you have ~3 billion instructions per second to play with: to get 10K+ TPS if you don't do anything too stupid. 
&lt;/p&gt;
&lt;p&gt;
I have to say, I did do many things smart at first and turned out to be stupid, which made the ~3 billion instructions per second hardware helpless to the project. Not just about performance, there are many mistakes leads to other symptoms as well. 
&lt;/p&gt;
&lt;p&gt;
Sometimes, the "I" here can be substitute with "We". I am sure, and I have seen other people made same mistakes as I did. As poor software developer, we do not have control of many things, but code at our hands. It is not surprising people spent a lot of time to make their code "smart". A lot lessons can be learned from those smartness.
&lt;/p&gt;
&lt;p&gt;
I do not have a full list yet, but as a start I will list some here. As this blog is primarily technical, I will keep the items mostly relevant. If I find time I will complete them one by one:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How build tools re-invent scripting language and command line, especially msbuild.&lt;/li&gt;
&lt;li&gt;The evil of lazy loading&lt;/li&gt;
&lt;li&gt;Other evil things of ORM&lt;/li&gt;
&lt;li&gt;How to hack your dependency injection tools to be a rocket science&lt;/li&gt;
&lt;li&gt;Build castle on top of sand, aka outlook and isolation&lt;/li&gt;
&lt;li&gt;Anything related to Microsoft is evil, especially COM&lt;/li&gt;
&lt;li&gt;Encapsulation might helps initially, but not that helpful as you expect, even harmful sometimes&lt;/li&gt;
&lt;li&gt;Re-invent the wheel, in many ways and how to make reasons to make it looks good&lt;/li&gt;
&lt;li&gt;Abandoned architecture is even worse than wrong architecture&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-3973231604975184632?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/3973231604975184632/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=3973231604975184632' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/3973231604975184632'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/3973231604975184632'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2011/01/retrospection-mistakes-i-have-made.html' title='Retrospection: the mistakes I have made these years'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-1859455033725185986</id><published>2010-11-27T17:39:00.000+08:00</published><updated>2010-11-27T18:15:53.040+08:00</updated><title type='text'>Package, the missing language feature - Part II</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Problems&lt;/span&gt;
&lt;p&gt;
In previous post, we have talked about how package works in Python language. Essentially, the problem is, the package is a good box, but color is not black. We want the package to expose all its API at the package level, and seal up any internal details. from A import * should give you all the things you need, you do not need to import A.B, or import A.C.
&lt;/p&gt;
&lt;span style="font-weight: bold;"&gt;Unimportable&lt;/span&gt;
&lt;p&gt;
So, how to make a module unimportable? There are two things you need to do. First, remove the B attribute from package A object. By doing this, import A.B will fail. Because import A.B will first import A, and then import A.B, and then get B from A. By deleting B from A the import will fail. Second you need to remove A.B from sys.modules, by sys.modules['A.B'] = None. This will make from A.B import * fail. 
&lt;/p&gt;
&lt;pre class="brush: python"&gt;
delattr(package_A, 'B')
sys.modules['A.B'] = None
&lt;/pre&gt;
&lt;p&gt;
This way, we completely hide the existence of A.B. Which is the behavior we want when other package want to import this private internal. The drawback of this mechanism is that, the error message user get is not friendly. They will be told the module does not exist, but it actually exist if you look it up in the file browser.
&lt;/p&gt;
&lt;span style="font-weight: bold;"&gt;When?&lt;/span&gt;
&lt;p&gt;
By making internal packages modules unimportable we can make the parent package a blackbox. But when we do this, deleting all the internal packages and modules?
&lt;/p&gt;&lt;p&gt;
The best place is in the __init__.py of parent package. But after we delete the internal packages and modules, they are gone. What if A.B reference A.C in the code? The thing we need to do is to make sure A.B are initialized(imported) before sealing up A. In A.B it might use import A.C or from A.C import xxx, both ways copy the referenced name to local namespace. So even A.C no longer exists, in A.B they can still be referenced.
&lt;/p&gt;
&lt;pre class="brush: python"&gt;
from .B import *
package_A = sys.modules['A']
delattr(package_A, 'B')
sys.modules['A.B'] = None
&lt;/pre&gt;
&lt;span style="font-weight: bold;"&gt;Where?&lt;/span&gt;
&lt;p&gt;
Do I need to write those kind of ungly delattr in every __init__.py file? Isn't that a cross-cutting concern that should not be repeated in every place? 
&lt;/p&gt;
&lt;p&gt;
Yes, let's find some way to magically inject those code in every __init__.py file. The code actually has three parts. Part 1, expose API. Part 2, eager load sub modules. Part 3, delete sub modules. API stil need to be manually defined in __init__.py. But part 2 and 3, they can be put into "post-import-hook".
&lt;/p&gt;
&lt;p&gt;
What is post import hook? They are the code executed after module being imported. After A being imported, we can eager load all its sub modules by scanning folder and then delete them. Post import hook is not directly supported in Python, but can be done by more powerful meta import hook.
&lt;/p&gt; 
&lt;pre class="brush: python"&gt;
def register_meta_import_hook(should_apply, post_import_hooks):
    import sys
    import imp

    class Importer(object):
        def __init__(self, file, pathname, description):
            self.file = file
            self.pathname = pathname
            self.description = description

        def load_module(self, name):
            try:
                module = imp.load_module(name, self.file, self.pathname, self.description)
                for post_import_hook in post_import_hooks:
                    post_import_hook(module)
                return module
            finally:
                if self.file:
                    self.file.close()

    class Finder(object):
        def find_module(self, qualified_module_name, path):
            if not should_apply(qualified_module_name):
                return
            if not path:
                path = sys.path
            module_name = qualified_module_name.rpartition('.')[2]
            file, pathname, description = imp.find_module(module_name, path)
            return Importer(file, pathname, description)

    sys.meta_path.append(Finder())
&lt;/pre&gt;
&lt;span style="font-weight: bold;"&gt;Conclusion&lt;/span&gt;
&lt;p&gt;
By eager loading sub modules and delete them in post import hook. We can seal up package and force people to define the package API in __init__.py, because that is the only way to let outsider to use the internal.
&lt;/p&gt;&lt;p&gt;
Another interesting side effect is that the circular dependency between packages are no longer possible. In python, circular dependency between modules are not possible, but because module in package is lazy loaded, so circular dependency between packages were possible. But after eager loading sub modules, we now disabled the circular dependency between packages. It is good thing, but could be very strict.
&lt;/p&gt;&lt;p&gt;
Finally, we have the box. And automatically seal it up in post import hook. If the box writer want to make the box external usable, they need to define its API in __init__.py. Plus, no circular dependency ever.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-1859455033725185986?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/1859455033725185986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=1859455033725185986' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/1859455033725185986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/1859455033725185986'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/11/package-missing-language-feature-part.html' title='Package, the missing language feature - Part II'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-3428977871244078420</id><published>2010-11-25T23:17:00.008+08:00</published><updated>2010-11-27T17:38:53.318+08:00</updated><title type='text'>Package, the missing language feature - Part I</title><content type='html'>&lt;span style="font-weight:bold;"&gt;Introduction&lt;/span&gt;
&lt;p&gt;
We have spent a way too much time on functions and classes. We put a lot of energy to maintain a clean and concise interface of classes. We care about the dependency between classes, by encouraging dependence injection and wire up objects via interface.
&lt;/p&gt;&lt;p&gt;
Besides functions and classes, we do have higher level construct. We have hacked the class loading of java to the hell, and then Satan gives us back his OSGi. We have invented a dedicated job to maintain the manifest of EJB. When dependency injection is not enough, people do find concept called Module emerging in modern things like Guice and Autofac.
&lt;/p&gt;&lt;p&gt;
What is Package? Itself is merely a name. It is just some annoying leading dots before the thing you actually want. It is not even a thing, it is just a being ignored prefix. People might say, oh yes, package is not doing anything, why should I care? Function is doing something, class is also doing somethings, package is just some dummy folder that I can put those valuable stuff inside it. 
&lt;/p&gt;&lt;p&gt;
True, very true... so does the language designers. I can not say all of them does, but at least some of them does. Stroustrup ignored package. Gosling ignored package. Even Hejlsberg ignored package (But, assembly is better than nothing). What a huge mistake!   
&lt;/p&gt;&lt;p&gt;
The problem we normally need to solve when writing business software is not some scientific work. In my mind, the only problem we need to solve is managing complexity. As we learned long time ago, the only way to control complexity is to break it down, and break it down further. One thing containing many other things. We need blackbox to encapsulate the internal complexity monster and give outside a clean and simple illusion. But constantly, when using Java or C# I find I need to reinvent all kinds of blackboxes to meet my needs. And none of them seems naturally to new comers, simply because they are not part of the original language, not known to most people, and not supported by many tools. There are many design patterns, people say they exist because the language itself is flawed. There are also many component platform/framework, I say it is because the language itself is flawed. It is because the language does not give us the blackbox, so we need to invent one ourself.
&lt;/p&gt;&lt;p&gt;
Package, it is a missing language feature for a long time. But luckily, Java or C# is not the only choice we have. In another open wonderland, without money but happiness, we have our lovely Python. In there, we finally see what is called package.
&lt;/p&gt;
&lt;span style="font-weight:bold;"&gt;Package in Python&lt;/span&gt;
&lt;p&gt;
Package in python is simple. If you have a folder called some_package, and you have a __init__.py file in that folder, then it becomes a package called some_package. If you happen to have another folder inside some_package folder called another_package, and itself also has a __init__.py file inside the folder, then it becomes some_package.another_package.
&lt;/p&gt;&lt;p&gt;
The key difference between package in Java and package in Python is, in Java, the package is just a literal symbol, it does not exist in the runtime. In Python, the package is a living object and you set and get attribute on it anytime. some_package.another_package.abc = 'def' is a valid statement in Python language.
&lt;/p&gt;&lt;p&gt;
This gives us the box we want. We can use this box to define our interface and hide our internal complexities. A package structure like A.B.C, A should hide B, C. B should hide C. In the A level, you might say start the car. In the B level, you might say start the engine, and then start radio and air conditioner. The hierarchical structure of package naming is the best fit for natural encapsulation.
&lt;/p&gt;&lt;p&gt;
The box can also initialize itself. It has a __init__.py file which can be used to execute any bootstrapping code. Sometimes we need to sort out some internal stuff before ready for outside service. Sometimes, we need to register ourself as subscriber for event published somewhere else. Having simple __init__ solves a lot of problem. It is powerful enough? Not really, it does not support thousand other features, like full lifecycle management, standard remote control interface, etc. As a user facing public component, the package construct exposed by the language is very limited. But we can build on top of what language provides us. 
&lt;/p&gt;&lt;p&gt;
The real problem of python package is not it does not support things like JMX. The real problem is the box is not really a black box. Actually, everything in Python is sort of made by Glass. You can see right through nearly everything, that is public in Java term. Although we can use _ as convention, and __ as hard compiler constraint in some place. But here, the ugly underscore is not helping. You can always reference A.B.C.xxx anytime you want, and that is dangerous. It breaks encapsulation, introducing tangling dependency without being noticed. No one likes that, we want to make sure Y.X.Z only reference A.B.C.xxx through A.yyy. It should not know the internals like A.B.C.xxx.
&lt;/p&gt;&lt;p&gt;
Classic "pythonic" response would be, that is just a convention. When convention is there, people should follow. The problem is, this is not a easy convention, and can be broken in any minute. There is no easy rule people can follow. The real difficulty is, when you reference A.B.C.xxx you can not always reference it as A.yyy. If your code lives inside A.B, then you should not reference A.yyy, because the inside package should not reference the outside, as it is in the lower place in the dependency pyramid. In this case, you do need to reference A.B.C.xxx as A.B.C.xxx as it is some thing you have to deal with. It is no longer a hidden internal, you are living inside the internal. In other words, A.B.C.xxx is not always public or private. It is accessible or not depending on where you are. And that is exactly what encapsulation is about.
&lt;/p&gt;&lt;p&gt;
How can we make the box really a black box? Let's continue in the Part II.
&lt;/p&gt;
&lt;h4&gt;Update 11.27&lt;/h4&gt;
&lt;p&gt;
Alan (https://alanfranz.pip.verisignlabs.com/) commented:

I'm not 100% sure about what you'd like to say in the next part, however:

&lt;ul&gt;
&lt;li&gt;beware about "bootstrapping code". Many times such "static initializer" is known to provoke unpredictable problems, and will prevent package breakup via pkg_resources namespace_package , if ever needed.
&lt;br/&gt;
Most of the times initialization should be performed by the client code or should be performed at first request; import-time initialization is absolutely abused in python coding.
&lt;/li&gt;
&lt;li&gt;convention is fine. If anybody imports a module or package that starts with underscore, it's their business - after all, if they've got the source code, they can modify all the names and make them public, can't they? Would you prefer java-like things where you can set methods and attributes private, and then you can access them via other means through some common.util.lang tool?
&lt;/li&gt;
&lt;li&gt;
nesting too much might just be unneeded, and if you want a leaf (a.b.c) not to depend on its parent you can use relative imports. But remember that two modules importing one the other trigger an error in Python, you simply can't do that.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;&lt;p&gt;
Thanks Alan! I am surely aware of "bootstrapping code". The major problem of import time initialization is it is implicit. And it will be even worse if the bootstrapping is I/O intensive or causing other side effects. But if there are too many clients, like a lot of unit tests, pushing responsibility to them is also inconvenient. I use __import__('x.y.z') in the main function to implicit stating that I want to use those packages and initialize them now.
&lt;/p&gt;&lt;p&gt; 
Starting with _ means private, that is fine. But a package is not always private, it is public to its siblings, but private to other packages. Traditional visibility only allow you to specify one thing is public or not, I think that is not enough. How many details you are allowed to know, or should depending on, might be contextual.   
&lt;/p&gt;&lt;p&gt;
Python does check the circular dependency on module level. But it does not check circular dependency on package level. For example, A.B can not circular depend on C.D, but if A.E depend on C.D and C.D depend on A.B, that is allowed. But actually, that means package A depend on C, package C depend on A as well.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-3428977871244078420?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/3428977871244078420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=3428977871244078420' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/3428977871244078420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/3428977871244078420'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/11/package-missing-language-feature-part-i.html' title='Package, the missing language feature - Part I'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-8839140953789795009</id><published>2010-02-22T17:48:00.003+08:00</published><updated>2010-02-22T18:15:28.692+08:00</updated><title type='text'>Data Migration (3)</title><content type='html'>&lt;p&gt;The final question about data migration. How to write it? We already know it is just a function to transform a dictionary into another dictionary. We also know there will be question around dependencies between entities. So, what we need are several functions, with each one upgrade one version. The migration function is per version delta, not per entity. The function need to do several things:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Find out what are the entities need to be migrated.&lt;/li&gt;
&lt;li&gt;Load the entity state as dictionary.&lt;/li&gt;
&lt;li&gt;Apply the migration logic on the dictionary.&lt;/li&gt;
&lt;li&gt;Save the entity state back.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
For finding entities to be migrated is easy. We already know SQLServer as XQuery support. We can write customized xquery to find out what are target entities. Most of time, it will be based on CLR_TYPE. Then the only thing that being a problem is how to write a function to transform a dictionary in memory to another dictionary.&lt;/p&gt;
&lt;p&gt;It might seems easy, we just need to write a function in C#, which takes dictionary as input and return a new dictionary. Yes, this would work. But the code would be very details, and looks unintentional. It would need to a lot of casting to cast a entry to a list or a string or another dictionary, based on your knowledge of the object graph. It also need to do a lot of detailed operation, like copy a field to another and delete the older field to do a renaming. The issue of plumbing code might be solved by introducing non-static typed language, like ruby as a migratioin scripting language. But the more essential problem is how to raise the abstraction level, so that the migration script can looks like more intentional, and reveals the original requirements.&lt;/p&gt;
&lt;p&gt;
One naive change is we write several function for the well known refactorings. Like we can have Rename, MoveType, ExtractEntities. And that was exactly what we have tried before. The problem of these small functions are they are not really that reusable. Say, we have a rename function, which a change the direct field from one name to another. But what if we renamed a field but it is inside the object graph not directly on the root. Then the rename function can no longer help us. We might think we can abstract the "locating" part of the function. Instead of passing in two string to identify the fields by name, we pass in two locators.
&lt;/p&gt;
&lt;p&gt;The locator is not easy to implement. Say, we are renaming field x.y.z to x.y.k, x is branch of the root, and y is branch of x,  and z was the field on y, and k is the new name of z. The rename function need to take "x.y.z" and "x.y.k" as input, and know how to apply them. For "x.y.z" we need to "get" the value, and then use "x.y.k" to "set" the value, then use "x.y.z" again to delete the field. The logic of getting value is very different from the logic of setting value.&lt;/p&gt;
&lt;p&gt;In general, this apporach was called Functional Programming. By decomposing the big function to smaller one, and then compose them back to cope with different situations, we can maimize the reusbility.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-8839140953789795009?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/8839140953789795009/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=8839140953789795009' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/8839140953789795009'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/8839140953789795009'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/02/data-migration-3.html' title='Data Migration (3)'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-3825168706594355533</id><published>2010-02-21T11:33:00.003+08:00</published><updated>2010-02-21T11:53:58.899+08:00</updated><title type='text'>Flattening &amp; Rebuilding</title><content type='html'>&lt;p&gt;Using SQLServer as your nosql database to persist objects state has issue with data migration. I mentioned three in the previous post. There are two more, today we are going to talk one of them. You can not deserialize object state back to class whoes fields have been changed. Think about a class used to have a field called name, now the field changed to firstName. When deserialize, where should the value of name assigned back to? If we can not get the raw value, how can we apply the data migration logic? We already talked about the data we stored in SQLServer is XML, are we going to parse the XML and manipulate the xml element directly? Yeah, I think we have to.
&lt;/p&gt;
&lt;p&gt;
So the data migration logic is not applied on the same object model which your application logic dealing with. It has to be at a lower level. We could data migrate the xml data elements, but that is just too tightly coupled with the data format. Before we use the xml, we actually tried JSON for a month, until we found the XQuery is really a killer. Also, xml element has many things we do not care about. So, what we need is a model which can capture the states of the objects, but simple enough. This is model is also directly related to how serialization/de-serialization is working. It works like this:&lt;/p&gt;
&lt;pre&gt;
object ==Flatten==&gt; many dictionaries(with dict/list/string inside) ==Serialize==&gt; XML
&lt;/pre&gt;
&lt;pre&gt;
XML ==Deserialize==&gt; dictionary(load referenced entity state on demand) ==Rebuild==&gt; objects
&lt;/pre&gt;
&lt;p&gt;
The XML looks like this:
&lt;/p&gt;
&lt;pre&gt;
&amp;lt;Entity CLR_TYPE="Domain.Calendar.Location" Country="China" 
CountryAbbreviation="CN" LId="43-123" TaxUnit="Jiangxi" TaxUnitAbbreviation="JX" /&amp;gt;
&lt;/pre&gt;
&lt;p&gt;
It will be deserialized to a dictionary containing 5 entries. The CLR_TYPE will be used in the rebuilding process to rebuild the dictionary back to a object. Except dictionary, string is also valid. string is used to store the field name as well as the simple field value. The persistence layer need to define how to translate a date time into a string, and etc. Collection is also valid. Although in theory, collection is just a special case of dictionary.&lt;/p&gt;
&lt;p&gt;
The XML is just state storage for a single entity. Entities can inter-relate with each other. We are not going to store the state for other entity in same XML. They will be referenced by ID, and stored separated in different rows in the table EntityState.
&lt;/p&gt;
&lt;p&gt;Because we separated the serialization into two phases, that is why we can do data migration. The data migration is just a function, who take a dictionary as input, and produce another dictionary. Now the only problem is, how can we write such kind of function? Yes, it might be trivial for just one version, but if you are going to change it very frequently doing agile software development, then it is a big issue. We are going to talk about the "reusbility" of data migration rules in the next post.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-3825168706594355533?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/3825168706594355533/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=3825168706594355533' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/3825168706594355533'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/3825168706594355533'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/02/flattening-rebuilding.html' title='Flattening &amp; Rebuilding'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-8187124041777854873</id><published>2010-02-20T14:14:00.002+08:00</published><updated>2010-02-20T14:35:34.337+08:00</updated><title type='text'>Data Migration (2)</title><content type='html'>&lt;p&gt;There are three difficulties about nosql data migration, particularly if the data is serialized object state:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Relationship between entities causing the dependencies between data migration rules.&lt;/li&gt;
&lt;li&gt;Lack of ad-hoc query support.&lt;/li&gt;
&lt;li&gt;Hard to migrate in batch.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The two approaches we talked about: Migrate on load, Migrate in one time. Both of them pros and cons, and they are very much related to the three difficulties mentioned above.&lt;/p&gt;
&lt;p&gt;The pros of Migrate on load: It do not need to shutdown your database or application. So, in theory, live migration is possible this way. Another related big benefit is you spread the cost of data migration over the period. So, if the data set is huge, it is very economical to do it this way. Especially a lot of the data are not frequently being used.&lt;/p&gt;
&lt;p&gt;The cons of Migrate on load: very very difficult to deal with the dependencies between data migration rules. Not able to fail fast, if there is flaws in the data migration code itself. Also the design more sophisticated so more likely to run into problem.&lt;/p&gt;
&lt;p&gt;The pros/cons of Migrate in one time is exactly the opposite of the above. Both of them have problems to deal with lack of ad-hoc query support. For example, if you want to change a reference from id to a business id, then it is very likely you need to translate from one particular id to another business id. This kind of query is very unlikely to have designed index tables. So if we do not have ad-hoc query support, then the data migration code is very hard to write. You might need to build special index table just for data migration purpose. Luckily, if we use SQLServer as nosql database, then we can leverage the xquery capability.&lt;/p&gt;
&lt;p&gt;For batching, if you migrate on load, it is not a problem. But if you migrate in one time, it might be very time-consuming. It is now the NO.1 concern in my team around data migration. I have no good solution to this one yet. Previously, we write data migration using SQL, it is batch processing in it's nature. But now we do not have schema, so SQL is not applicable anymore, which means more RPC round-trip involved in the data migration. We need to literally load the whole database out. The long term mitigation is to introduce Map-reduce.&lt;/p&gt;
&lt;p&gt;The not so well known problem is the problem of dependencies. One simple question. If we move a field from class A to B. When we load the object of class A, should we do the migration of class A and referenced object of class B? When we load the object of class B, should we do the migration of class B and referenced object of class A? Then, are we running into the circular reference problem here? This is just a obvious example, there are much more not so obvious examples. For example, if we delete a class. That means all data migration referencing that class must be all executed against the whole database, otherwise we have possibility to not able to load object of that class anymore. How can we avoid that? Let's talk about it in next post.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-8187124041777854873?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/8187124041777854873/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=8187124041777854873' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/8187124041777854873'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/8187124041777854873'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/02/data-migration-2.html' title='Data Migration (2)'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-5304130740652203529</id><published>2010-02-19T21:23:00.003+08:00</published><updated>2010-02-19T21:38:32.815+08:00</updated><title type='text'>Data Migration</title><content type='html'>&lt;p&gt;For nosql data storage, there is one thing not being talked about very much. That is migration. Typical answer to this is, for nosql database, there is no schema. So, there is no need to migrate the data. But this is a lie, although the database do not need a schema to store data, that does not mean data itself do not have structure.
&lt;/p&gt;
&lt;p&gt;Nosql claims to be very flexible so that semi-structure data can be stored. But if we use SQLServer as nosql database to persist objects, then the data itself must be structured. If we change the class definition, then the data can not be de-serialized, unless we migrate the data to match the new class definition.
&lt;/p&gt;
&lt;p&gt;
The first typical response, no schema no migration does not work. The second typical response is backward compatible. The idea is, let the application code able to handle the data with older version. This might work, if the data structure (especially the inter-references between the objects) is simple, and the format application code consume is not strict (De-serialization according to class definition is a extremely strict). Also, the more older version need to be compatible, the more complex the application code would be. So, this approach is not pervasive and scalable.
&lt;/p&gt;
&lt;p&gt;
The third response is, add a hook to the persistence layer. Before loading the data for use, check the version of the data. If the version is older, then execute a special transformation to upgrade the data into a newer version, so that the application code can be orthogonal with data migration logic. It seems very optimal, but actually not that easy to implement in practice.
&lt;/p&gt;
&lt;p&gt;
The fourth response is, do a full data migration just like traditional RDBMS data migration. Load all the data out, transform them and save them back.
&lt;/p&gt;
&lt;p&gt;
For object persistence, only approach 3 and 4 would work. And we tried them both, and got lots of surprisingly experiences. In next post, we will look into them in-depth, and see what will work, what work well, and what seems good solution actually costs a lot. 
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-5304130740652203529?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/5304130740652203529/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=5304130740652203529' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/5304130740652203529'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/5304130740652203529'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/02/data-migration.html' title='Data Migration'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-7391140290136036752</id><published>2010-02-18T17:00:00.002+08:00</published><updated>2010-02-18T17:03:45.196+08:00</updated><title type='text'>Avoiding N+1 Problem (2)</title><content type='html'>&lt;p&gt;In the previous post, it talked about how to use batch load to avoid loading references one by one, causing too many queries. 

First we start with optimizing loading collection. Then we found, all references can be optimized in the same way. But for the 

indirect references, it seems like very tedious to optimize.&lt;/p&gt;
&lt;p&gt;How can we make things batchable into a batch without making the code looking like a mess? For example, object A reference 

object B, object C. And object B reference D, E. And object C refernece F, G. How can we loading D, E, F, G in one batch? This 

was not a issue if we use tranditional ORM, because the schema are different for B, C, so the SQL will be different, there is no 

way to do such kind of batch loading. But now because all entities are stored in the same schema in EntityState table, it is 

logically possible to do this optimization.&lt;/p&gt;
&lt;p&gt;The difficulty is not about loading or batching the entities. The loading is just the same SQL. The differences between 

loading D, E, F, G is the post processing. For different object need to be loaded and then assigned to different fields. So it 

is essential to know what the post processings are. A ideal way to do this in C# is: &lt;/p&gt;
&lt;pre class="brush: csharp"&gt;
IDictionary&amp;lt;Guid, Action&amp;lt;EntityState&amp;gt;&amp;gt; accumulatedCallbacks
&lt;/pre&gt;
&lt;p&gt;If we can store the post processings in a dictionary called accumulatedCallbacks, then we can decide when to do the post 

processings. So, instead of doing&lt;/p&gt;
&lt;pre class="brush: csharp"&gt;
var entityState = entityStateLoader.Load("xxxx");
DoMyPostProcessing(entityState);
&lt;/pre&gt;
&lt;p&gt;we pass the post processing as Action&lt;EntityState&gt;, and store them in the dictionary. Then, when the batch is "big enough", 

we can call those callbacks passing the loaded entity states.&lt;/p&gt;
&lt;pre class="brush: csharp"&gt;
entityStateLoader.LoadLater("xxxx", DoMyPostProcessing);
&lt;/pre&gt;
&lt;p&gt;Now, this seems works, except when are we going to call these accumulated callbacks. When are we going to load entities with 

those ids? To answer this, we'd better to look at the code&lt;/p&gt;
&lt;pre class="brush: csharp"&gt;
public void Flush()
{
  while (accumulatedCallbacks.Count &amp;gt; 0)
  {
    var callbacks = new Dictionary&amp;lt;Guid, Action&amp;lt;EntityState&amp;gt;&amp;gt;(accumulatedCallbacks);
    accumulatedCallbacks.Clear();
    ApplyCallbackOnEntities(callbacks.Keys.ToArray(), callbacks);
  }
}

private void ApplyCallbackOnEntities(IEnumerable&amp;lt;Guid&amp;gt; ids, Dictionary&amp;lt;Guid, Action&amp;lt;EntityState&amp;gt;&amp;gt; callbacks)
{
  var loadedStates = states.BatchLoad(ids);
  foreach (var state in loadedStates)
  {
    callbacks[state.Id](state);
    callbacks.Remove(state.Id);
  }
  foreach (var callback in callbacks.Values)
  {
    callback(null);
  }
}
&lt;/pre&gt;
&lt;p&gt;The important stuff is in the while loop. The sequences are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Copy accumulatedCallbacks to a local variable&lt;/li&gt;
&lt;li&gt;Clear the accumulatedCallbacks&lt;/li&gt;
&lt;li&gt;Loading happed: states.BatchLoad(ids)&lt;/li&gt;
&lt;li&gt;Each callback being called&lt;/li&gt;
&lt;li&gt;A tricky thing is, while callback being called, the accumulatedCallbacks will accumulate more callbacks in the mean time, 

because the callback will call LoadLater to load its references as well.&lt;/li&gt;
&lt;li&gt;If the accumulatedCallbacks not empty, repeat the steps again&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Essentially, we turned a sequential process into a async connected steps, which is also known as continuation. Then, we can 
archive better runtime performance and still not making the main logic (load and assign back to fields) not knowning the the 
performance optimization we have done. Another example of separation of concern.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-7391140290136036752?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/7391140290136036752/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=7391140290136036752' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/7391140290136036752'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/7391140290136036752'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/02/avoiding-n1-problem-2.html' title='Avoiding N+1 Problem (2)'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-8894951117718441378</id><published>2010-02-17T11:45:00.003+08:00</published><updated>2010-02-17T12:08:00.857+08:00</updated><title type='text'>Avoiding N+1 Problem</title><content type='html'>&lt;p&gt;
In the previous post, I talked about how to use SQLServer to store objects in a nosql way. But that leaves a opening question, "How can we avoid N+1 problem?". What is N+1 problem? Let's do a quick recap.
&lt;/p&gt;
&lt;p&gt;
N+1 problem is also called 1+N problem and ripple loading. Loading "1" object, we also need to load "N" objects it references to for "N" times, one by one. So, that's why it is called 1+N. Why it is a problem? The problem is the overhead of network and sql execution. 

&lt;pre&gt;
Database &lt;---Overhead--- Application
Database &lt;---Overhead--- Application
Database &lt;---Overhead--- Application
Database &lt;---Overhead--- Application
&lt;/pre&gt;

The more sql we issued, the more overhead it would be. So, a natural solution is to batch the operations. If for the "N" reference, we just need "1" sql to load them all, then the problem is no longer a problem.
&lt;/p&gt;
&lt;p&gt;
A quick solution would be loading all the objects inside a collection with one sql. For example, a object "User" has a field "messages" with collection of "Message". Then assuming the user kayla has a messages referencing message with id 2, 3, 4, 5. Then we just need one sql to load all her messages.
&lt;/p&gt;
&lt;pre class="brush: sql"&gt;
SELECT * FROM EntityState WHERE id IN (2, 3, 4, 5)
&lt;/pre&gt;
&lt;p&gt;
However, this solution only works for the case of loading collection reference. But if the "User" class also reference "Department", "Manager", "Calendar"... For each reference, we still need a separate SQL to get them because they are not in a collection. 
&lt;/p&gt;
&lt;p&gt;
Moving further, we can try to iterate all the fields of a object, get all the references. Also, for field with collection value, find out all the members. Then combining them together, we can load them together with one SQL again.
&lt;/p&gt;
&lt;p&gt;
For example, kayla has Department with id 6, Manager with id 7, Calendar with id 8. And the messages referencing 2, 3, 4, 5. Then we just need
&lt;/p&gt;
&lt;pre class="brush: sql"&gt;
SELECT * FROM EntityState WHERE id IN (2, 3, 4, 5, 6, 7, 8)
&lt;/pre&gt;
&lt;p&gt;
And for the result set returned. Assigning 2, 3, 4, 5 back to the field messages. Assigning 6 back to field department. Assigning 7 back to field manager. Assigning 8 back to field calendar.
&lt;/p&gt;
&lt;p&gt;
Does this solve all of the problems? Not yet... How about manager also has reference to several other objects 9, 10, department reference several other objects, 11, 12. Should we load 9, 10, 11, 12 in one sql? How can we do that?
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-8894951117718441378?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/8894951117718441378/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=8894951117718441378' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/8894951117718441378'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/8894951117718441378'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/02/avoiding-n1-problem.html' title='Avoiding N+1 Problem'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-1512767662818942634</id><published>2010-02-16T22:11:00.009+08:00</published><updated>2010-02-22T18:16:10.994+08:00</updated><title type='text'>Use SQLServer as your nosql database</title><content type='html'>&lt;p&gt;You might be wondering when you are able to use those cool nosql database in your project. But why? You manager might ask. You'd better be prepared. I see nosql database provides two benefits:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;b&gt;Scalability&lt;/b&gt;
By removing schema, the data entry can be very easily replicated. By removing foreign constraint, the data entry can be replicated without replicating all the entries it references. Then we can build shading around user boundaries.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Productivity&lt;/b&gt;
The object in the memory and the persisted object state in the database are really the same thing. If we need to do significant mapping between these two models, there must be something wrong, the productivity might hurt as well as the performance. If the two are really the same thing, why we have to store the object state as relational data? Using nosql database, the persistence of objects can be as easy as serialization.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
If you think the second one is more likely to attract you and your manager, then SQLServer might be used as nosql database for your project. SQLServer? Yes, actually any RDBMS could be used as nosql database. The problem we are trying to solve is how to persist objects to RDBMS optimizing for developer productivity and optional runtime performance and scalability.
&lt;/p&gt;
&lt;b&gt;Difficulty of RDBMS object persistence&lt;/b&gt;
&lt;ol&gt;
&lt;li&gt;OR-Mapping constraints the objects design and it is a overhead just like the memory management in C++ programming. It is become very annoying when you refactor often.&lt;/li&gt;
&lt;li&gt;Query design. The complex query requires highly skilled SQL writer. Even you can  get it running correctly, but might have a problem to make it performant. The ORM solution add another level of complexity to require you specify the loading strategy.&lt;/li&gt;
&lt;li&gt;N+1 problem. Loading a deeply nested object graph very often leads to N+1 problem. The so called ripple loading is probably the most often seen performance problem while using ORM.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;
By using the idea of nosql database, we can can overcome those things and let your developers only focus on using object technology to implement the business requirement. The first thing we need to do to claim being nosql is not "not using sql" but "not using the schema".
&lt;/p&gt;
&lt;b&gt;No schema&lt;/b&gt;
&lt;p&gt;
We create a table called "EntityState", with two columns: id, xml. With the id being the id of the entity persisted, and the xml being the content of the entity state. So, the "EntityState" state is essentially a key/value pair database, but with more features actually.
&lt;/p&gt;
&lt;p&gt;But how can I get the object into xml? Thousands of ways, I have to say. The most important thing is to classify your objects into two categories: Entity or Aggreated. Being entity means it has a id, and every reference to this object should reference by id instead of serialize the content into xml. Being aggregated, means its state will be part of the xml. After we have done this, the circular reference problem might encounter is also prevented.
&lt;/p&gt;
&lt;b&gt;N+1 Problem&lt;/b&gt;
&lt;p&gt;The loading of the object is even more likely to run into the N+1 problem if we do not take it into consideration. Say we load object with id 1, and it reference objects with id 2 and 3. And object with id 2 reference objects with id 4, 5, 6... Then SQL issued for loading a single object can be as many as one thousand. This is obviously a problem. The rough idea is using callbacks or continuation like structure. The detailed solution will be described in the later article.
&lt;/p&gt;
&lt;b&gt;Index Tables&lt;/b&gt;
&lt;p&gt;The complex query you used to write is actually doing two things. It query, of course. Also, it build the model to query on the fly. If the model you persist the object happens having the column to query, the query can be as easy as one line. If the model is very far from the thing you want to query about, then you might need to join several tables and doing some SUM calculation in the SQL. If we can create index tables according to the query might have, then the problem become very trivial. The only problem is ensuring the index table get updated when the "EntityState" table getting updated.
&lt;/p&gt;
&lt;p&gt;
By doing this, your SQLServer database is no long the RDBMS you and your DBA familiar with. It might sounds scary, but it might worth trying if you start to think about NHibernate/Hibernate might not be the best solution. I will write more articles on this addressing:
&lt;ol&gt;
&lt;li&gt;&lt;a href="/2010/02/avoiding-n1-problem.html"&gt;Avoiding N+1 problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/2010/02/avoiding-n1-problem-2.html"&gt;Avoiding N+1 problem (2)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/2010/02/data-migration.html"&gt;Data Migration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/2010/02/data-migration-2.html"&gt;Data Migration (2)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/2010/02/flattening-rebuilding.html"&gt;Flatting &amp; Rebuilding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/2010/02/data-migration-3.html"&gt;Data Migration (3)&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-1512767662818942634?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/1512767662818942634/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=1512767662818942634' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/1512767662818942634'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/1512767662818942634'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2010/02/use-sqlserver-as-your-nosql-database.html' title='Use SQLServer as your nosql database'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-2146748476415833818</id><published>2009-06-16T14:57:00.004+08:00</published><updated>2009-06-16T16:01:43.994+08:00</updated><title type='text'>Multi bind in Guice 2.0</title><content type='html'>Guice is great tool to do dependency injection. But when you need to bind more than one implementation, or bind more than one instance, or bind a collection, things will be become tricky. After fighting with Guice for a long time, I think it is worth a while to document the tricks I've found.

&lt;h3&gt;Bind mutliple instances&lt;/h3&gt;
Given we have two database to connect in my project. This code will not work
&lt;pre class="brush: java"&gt;
bind(SqlMapClient.class).toInstance(createSqlMapClientForDB1());
bind(SqlMapClient.class).toInstance(createSqlMapClientForDB2());&lt;/pre&gt;
It will not work not only because Guice do not allow you bind same key(SqlMapClient.class is the key in this case) twice, but also when we use the dependency.
&lt;pre class="brush: java"&gt;
public class Service1 {
  @Inject
  SqlMapClient sqlMapClient;
}&lt;/pre&gt;
How can we know which database connection we've got? This is a well-known problem, and has been addressed since the 1.0. In 1.0, we have three choices to make it work:
&lt;h4&gt;Choice 1: Different Type&lt;/h4&gt;
We can use inheritance to make two SqlMapClient instances of different type.
&lt;pre class="brush: java"&gt;
public class Db1SqlMapClient extends SqlMapClient {
  private final SqlMapClient delegate;
  // delegate all methods of sql map client
}
public class Service1 {
  @Inject
  Db1SqlMapClient sqlMapClient;
}&lt;/pre&gt;
&lt;h4&gt;Choice 2: Binding Annotation&lt;/h4&gt;
Key in Guice is not necessary the type itself, it could be type and binding annotation. Use binding annotation, we can bind same type multiple times, although each binding is still using different key (different binding annotation).
&lt;pre class="brush: java"&gt;
@BindingAnnotation
@Retention(RetentionPolicy.RUNTIME
public @interface DB1 {
}

bind(SqlMapClient.class)
.annotatedWith(DB1.class)
.with(createSqlMapClientForDB1());

public class Service1 {
  @Inject @DB1
  SqlMapClient sqlMapClient;
}&lt;/pre&gt;  
&lt;h4&gt;Choice 3: Named Binding&lt;/h4&gt;
Guice pre-defined a binding annotation called "Named". We can use "Names.named()" to create a instance of it.
&lt;pre class="brush: java"&gt;
bind(SqlMapClient.class)
.annotatedWith(Names.named("DB1"))
.with(createSqlMapClientForDB1());

public class Service1 {
  @Inject @Named("DB1")
  SqlMapClient sqlMapClient;
}&lt;/pre&gt;

Although we have three choices, none of them is perfect. The first one is very tedious, and also the user of the SqlMapClient need to know the concrete type. The second one is better, but still the user need to know which one it depends on by annotate its dependency. Still kinda of violated the principle of "Inversion of Control". Also, we need to define one more class for binding annotation. The third choice do not need us to define new class, but it is not refactoring friendly, and can not be found by finding references. So the recommanded way to do that is strong typed binding annotation.

&lt;h4&gt;Choice 4: Guice 2.0 Child Injector&lt;/h4&gt;
In Guice 2.0, we can use child injector to define different binding to the same type.
&lt;pre class="brush: java"&gt;
db1Injector = injector.createChildInjector(new AbstractModule() {
  public void configure() {
    bind(SqlMapClient.class).toInstance(createSqlMapClientForDB1());
  }
});
db2Injector = injector.createChildInjector(new AbstractModule() {
  public void configure() {
    bind(SqlMapClient.class).toInstance(createSqlMapClientForDB2());
  }
});&lt;/pre&gt;
Different database connection need to be injected by different injector. To use this style, your system has to be partitioned to be managed by several different containers. It is not practical in real world.
&lt;h3&gt;Bind Set&lt;/h3&gt;
It seems easy, isn't it?
&lt;pre class="brush: java"&gt;
bind(Set.class).toInstance(new HashSet());&lt;/pre&gt;
But what if we need to bind two set, one for set of integer, another for set of string. How to do that?
&lt;pre class="brush: java"&gt;
bind(new TypeLiteral&amp;lt;Set&amp;lt;String&amp;gt;&amp;gt;(){}).toInstance(new HashSet&amp;lt;String&amp;gt;(){{
  add(&amp;quot;Hello&amp;quot;);
  add(&amp;quot;World&amp;quot;);
}});&lt;/pre&gt;
Or we can use the Types utility class introduced in Guice 2.0.
&lt;pre class="brush: java"&gt;
bind(Types.setOf(String.class)).toInstance(new HashSet&amp;lt;String&amp;gt;(){{
  add(&amp;quot;Hello&amp;quot;);
  add(&amp;quot;World&amp;quot;);
}});&lt;/pre&gt;
This also seems easy. But how about the element of the set is not just a simple String. What if we have a interface called OrderProcessor:
&lt;pre class="brush: java"&gt;
public interface OrderProcessor {
  void processOrder(Order order);
}&lt;/pre&gt;
Then we can have different OrderProcessor to process the order differently (send email, save the order into database):
&lt;pre class="brush: java"&gt;
public class MailOrderProcessor implements OrderProcessor {
  @Inject
  EmailSender emailSender
  // send mail
}&lt;/pre&gt;
&lt;pre class="brush: java"&gt;
public class DbOrderProcessor implements OrderProcessor {
  @Inject
  SqlMapClient sqlMapClient;
  // save order to database
}&lt;/pre&gt; 
Ok, now how to bind set of order processor? Can we do this?
&lt;pre class="brush: java"&gt;
bind(Types.of(Set.class)).toInstance(new HashSet&amp;lt;OrderProcessor&amp;gt;(){{
  add(new MailOrderProcessor());
  add(new DbOrderProcessor());
}});&lt;/pre&gt;
No, you can't. Because both of them have its own dependency. Manually newed instance will not inject those dependencies. To make it work, we have four choices:
&lt;h4&gt;Choice 1: Manually call injectMembers&lt;/h4&gt;
&lt;pre class="brush: java"&gt;
@Inject
Injector injector;
for (OrderProcessor orderProcessor : orderProcessors) {
  injector.injectMemebers(orderProcess);
}&lt;/pre&gt;
&lt;h4&gt;Choice 2: Wrapping the set&lt;/h4&gt;
&lt;pre class="brush: java"&gt;
public class OrderProcessors {
  private final Set&lt;OrderProcessor&gt; processors = new HashSet&lt;OrderProcessor&gt;();
  @Inject
  public void setDbOrderProcessor(DbOrderProcessor processor) {
    processors.add(processor);
  }
  @Inject
  public void setMailOrderProcessor(MailOrderProcessor processor) {
    processors.add(processor);
  }
}&lt;/pre&gt;
&lt;h4&gt;Choice 3: using getProvider&lt;/h4&gt;
&lt;pre class="brush: java"&gt;
bind(new TypeLiteral&amp;lt;Set&amp;lt;Provider&amp;lt;OrderProcessor&amp;gt;&amp;gt;&amp;gt;(){}).toInstance(new HashSet&amp;lt;Provider&amp;lt;OrderProcessor&amp;gt;&amp;gt;(){{
  add(getProvider(DbOrderProcessor.class);
  add(getProvider(MailOrderProcessor.class);
}});&lt;/pre&gt; 
Here, we used the feature of AbstractModule called getProvider. Although we can not call injector.getInstance() inside a module, but we can get the provider of the instance. This way, what we got is a set of the provider of the processor, instead of a set of order processor. This might not what you want.
&lt;h4&gt;Choice 3: getProvider + ProvidedOrderProcessor&lt;/h4&gt;
&lt;pre class="brush: java"&gt;
public class ProvidedOrderProcessor implements OrderProcessor {
  private final Provider&amp;lt;OrderProcessor&amp;gt; provider;
  public ProvidedOrderProcessor(Provider&amp;lt;OrderProcessor&amp;gt; provider) {
    this.provider = provider;
  }
  public void processOrder(Order order) {
    provider.get().processOrder(order);
  }
}&lt;/pre&gt;
Now, we can get a order processor instead of the provider of it.
&lt;pre class="brush: java"&gt;
bind(Types.setOf(OrderProcessor.class)).toInstance(new HashSet&amp;lt;OrderProcessor&amp;gt;(){{
  add(getLazyInstance(DbOrderProcessor.class));
  add(getLazyInstance(MailOrderProcessor.class));
}});
OrderProcessor getLazyInstance(Class&amp;lt;? extends OrderProcessor&amp;gt; clazz) {
  return new ProvidedOrderProcessor(getProvider(clazz));
}&lt;/pre&gt;
Not as easy as we thought, right?
&lt;h3&gt;Bind one collection by multiple modules&lt;/h3&gt;
What if we want to bind one instance of set using multiple module? There is a extension to Guice allow us to do that. 
&lt;pre class="brush: java"&gt;
public class Module1 extends AbstractModule {
  public void configure() {
    Multibinder&amp;lt;OrderProcessor&amp;gt; multibinder
         = Multibinder.newSetBinder(binder(), OrderProcessor.class);
    multibinder.addBinding().to(MailOrderProcessor.class);
  }
}
public class Module2 extends AbstractModule {
  public void configure() {
    Multibinder&amp;lt;OrderProcessor&amp;gt; multibinder
         = Multibinder.newSetBinder(binder(), OrderProcessor.class);
    multibinder.addBinding().to(DbOrderProcessor.class);
  }
}&lt;/pre&gt;
Seems perfect? By the way, multibindings extension also support Map.

&lt;h4&gt;Limitation&lt;/h4&gt;
But how about list? There is no official support to bind a list by multiple modules. Also, how to bind a chain of responsibilities (A.K.A decorators)?
&lt;pre class="brush: java"&gt;
public class DecoratedOrderProcessor implements OrderProcessor {
  private final OrderProcessor decorated;
  public OrderProcessor(OrderProcessor decorated) {
    this.decorated = decorated;
  }
  public void processOrder(Order order) {
    try {
      decorated.processOrder(order);
    } finally {
      // do something;
    }
  }
}&lt;/pre&gt;
When we have multiple decorators, which formed a chain of responsibilities, then the scenario becomes complex. If there is only one module, then we can use similar techniques like "ProvidedOrderProcessor" to bind it. But if there are more than one modules need to bind a element of the chain, then there is no official way to do it.
   
&lt;h3&gt;Use Guice to build extension point&lt;/h3&gt;
Comparing Guice and Spring, one advantage I see is Guice promotes the modular design. By grouping functionality into modules, we can see plug and unplug some implementation based on the environment and requirement (for example, test and production). It is also possible in Spring, to be fair, but it is just easier and more often used in Guice world. Using Guice, we can define something as default, then allow other module to be plugged in and override it. Here is a list of techniques you can use to make this kind of effect:
&lt;h4&gt;Choice 1: @ImplementedBy, @ProvidedBy&lt;/h4&gt;
&lt;pre class="brush: java"&gt;
@ImplementedBy(MailOrderProcessor.class)
public interface OrderProcessor {
}&lt;/pre&gt;
Then, in case all modules did not specify the binding for OrderProcessor, then the default one (MailOrderProcessor in this case) will be used. If there is a binding bind(OrderProcessor.class).to(DbOrderProcessor.class), then that one will be used. This feature is really neat, mostly in the case when we need to change something in the unit test environment.
&lt;pre class="brush: java"&gt;
@ImplementedBy
public interface CurrentTimeProvider {
  DateTime getNow();
  public static class DefaultImpl implements CurrentTimeProvider {
    public DateTime getNow() {
      return new DateTime();
    }
  }
}&lt;/pre&gt;
In the production environment, the CurrentTimeProvider will automatically use the default implementation. But in the test, we can bind(CurrentTimeProvider.class).toInstance(new FixedTimeProvider(2008,5,12)); then we can write the test eaiser by fixing the time.
&lt;h4&gt;Choice 2: @Inject(optional = true)&lt;/h4&gt;
&lt;pre class="brush: java"&gt;
public class ProcessOrderService {
  @Inject(Optional = true)
  OrderProcessor processor = new DummyOrderProcessor();
}&lt;/pre&gt;
When the provider side can not pick a default implementation, but the user side do know its default choice, then we can annotate the dependency as optional, and set a default value to it. When there is no binding to OrderProcessor, then the feature will be disabled by using DummyOrderProcessor. This behavior can be changed by plugging new module providing a implementatio of OrderProcessor.
&lt;h4&gt;Choice 3: Multibindings&lt;/h4&gt;
The extension of Guice we've mentioned above allow us to bind a set or map by multiple modules. Using this extension, we can allow new module to plug in their new implementation to modify the system behavior. Very useful way to provide extension point.
&lt;h4&gt;Choice 4: Module Override&lt;/h4&gt;
This is a new feature of Guice 2.0. Easy to use, and "powerful".
&lt;pre class="brush: java"&gt;
Module finalModule = Modules
.override(new DefaultModule())
.with(new CustomizationModule());&lt;/pre&gt;
If CustomizationModule defines same key as DefaultModule, the one defined in DefaultModule will be overriden. It is useful in some case, but I don't think it is a good feature. Instead, if possible, we should split big module into smaller modules, and compose them depending on our needs, instead of override them from outside. But, Modules.override opened a way to allow multiple modules to bind same list, or even a decorator chain:
&lt;pre class="brush: java"&gt;
Key CUSTOMIZABLE_KEY = Key.get(OrderProcessor.class, new Before(MailOrderProcessor.class));
bind(Types.listOf(OrderProcessor.class)).toInstance(new ArrayList&amp;lt;OrderProcessor&amp;gt;(){{
  add(new ProvidedOrderProcessor(getProvider(CUSTOMIZABLE_KEY));
  add(getLazyInstance(MailOrderProcessor.class);
}});
bind(CUSTOMIZABLE_KEY).toInstance(new DummyOrderProcessor());&lt;/pre&gt;
Before is a binding annotation. In another module, bind CUSTOMIZABLE_KEY again then we can override it:
&lt;pre class="brush: java"&gt;
bind(CUSTOMIZABLE_KEY).to(getLazyInstance(DbOrderProcessor.class));&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-2146748476415833818?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/2146748476415833818/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=2146748476415833818' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/2146748476415833818'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/2146748476415833818'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2009/06/multi-bind-in-guice-20.html' title='Multi bind in Guice 2.0'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-6457019734397855987</id><published>2008-05-09T16:50:00.010+08:00</published><updated>2009-06-16T14:57:18.678+08:00</updated><title type='text'>Anemic Domain Model</title><content type='html'>Martin wrote a blog a long time before: http://www.martinfowler.com/bliki/AnemicDomainModel.html. It was about domain model without rich behavior (anemic). Today, I am going to analyze why we have this problem, and try to give a elegant solution.

Let's give a example first. This is a task management system. Two entities in the domain, Employee, Task. So we can write the relationship as following codes:

&lt;pre class="brush: java"&gt;
public class Employee {
    private Set&amp;lt;Task&amp;gt; tasks = new HashSet&amp;lt;Task&amp;gt;();
}

public class Task {
    private String name;
    private Employee owner;
    private Date startTime;
    private Date endTime;
}&lt;/pre&gt;

It is a very typical parent/child relationship. Now, I want to add a behavior to my domain model. The behavior is: get all the processing task owned by a specified employee. If we ignore the existence of database, very naturally, this behavior belongs to Employee entity.

&lt;pre class="brush: java"&gt;
public class Employee {
    private Set&amp;lt;Task&amp;gt; tasks = new HashSet&amp;lt;Task&amp;gt;();
    public Set&amp;lt;Task&amp;gt; getProcessingTask() {
       ...
    }
}&lt;/pre&gt;

But if we do care the database. This design is not acceptable. Where can I get all my tasks? Are you going to load all my tasks when building the employee object? If we only have five tasks, that is OK. But if we have 5000 tasks, that probably is not acceptable. So, before the age of hibernate, we wrote:

&lt;pre class="brush: java"&gt;
public class TaskDAO {
   public Set&amp;lt;Task&amp;gt; getProcessingTasks(Employee employee) {
      ...//sql
   }
}&lt;/pre&gt;

hmmm, wait a moment... Is DAO part of domain model. Yeah... you can. Just rename it to TaskRepository, then it is part of your domain model. Really? I don't believe it. DAO is not part of your domain model. Instead, it stole the logic from domain. It is the reason why our domain model is anemic. Because the getProcessingTasks was part of Employee, but now belongs to a DAO. Can hibernate solve the problem?

&lt;pre class="brush: java"&gt;
@Entity
public class Employee {
    @OneToMany
    private Set&amp;lt;Task&amp;gt; tasks = new HashSet&amp;lt;Task&amp;gt;();
    public Set&amp;lt;Task&amp;gt; getProcessingTasks() {
       ...
    }
}&lt;/pre&gt;

yes! Hibernate rocks!
Have we succeed? No, not yet. Hibernate can make the tasks lazy-loaded. But you only have two options. Load, or not. If you are iterating tasks inside the impl of getProcessingTasks, you still end up as loading all the tasks from the database. 

To solve this problem, many people tried many different ways. The goal was "injecting something" into domain, then domain can execute query itself. The attempts including using hibernate interceptor, static code instrument, aspectj... Spring gave a answer to this:

&lt;pre class="brush: java"&gt;
@Entity
@Configurable
public class Employee {
    private TaskDao dao;
    public Set&amp;lt;Task&amp;gt; getProcessingTask() {
        return dao.getProcessingTask(this);
    }
    public void setTaskDao(TaskDao dao) {
        this.dao = dao;
    }
}&lt;/pre&gt;

The @Configurable annotation was introduced to inject DAO into domain model. Now, the domain can do what it supposed to do. Really? domain model depending on DAO made lots of people unhappy. The argued, the cyclic dependencies between DAO layer and Domain layer. The argued, domain should not be "bound" with database or any container. I personally think, it is not that a big issue... I think RoR Active Record is bounding the domain model with database, people still love it. Anyway, I started again, and looking for a more elegant solution.

Finally, I found, what if I wrote this:

&lt;pre class="brush: java"&gt;
public class Employee {
    private RichSet&amp;lt;Task&amp;gt; tasks = new DefaultRichSet&amp;lt;Task&amp;gt;();
    public RichSet&amp;lt;Task&amp;gt; getProcessingTasks() {
        return tasks.find("startTime").le(new Date()).find("endTime").isNull();
    }
...
}&lt;/pre&gt;

RichSet is a Set with extra capabilities (query, sum...)

&lt;pre class="brush: java"&gt;
public interface RichSet&amp;lt;T&amp;gt; extends Set&amp;lt;T&amp;gt; {
    Finder&amp;lt;RichSet&amp;lt;T&amp;gt;&amp;gt; find(String expression);
    int sum(String expression);
}&lt;/pre&gt;

DefaultRichSet is pure in memory implementation of those operations by iterating the set. So you can new a Employee in your unit test, and test the getProcessingTasks right way. No need to worry about database or dependency injection. Do you feel better?

But, where is the database? Er... This is complicated, you know. The first thing I need to do is mapping the entity in Hibernate. Er... hibernate do not like it. Hibernate expect a Set, not RichSet. I think I need to write more things to make hibernate happy:

&lt;pre class="brush: java"&gt;
&amp;lt;hibernate-mapping default-access="field" package="net.sf.ferrum.example.domain"&amp;gt;
    &amp;lt;class name="Employee"&amp;gt;
        &amp;lt;tuplizer entity-mode="pojo" class="net.sf.ferrum.RichEntityTuplizer"/&amp;gt;
        &amp;lt;id name="id"&amp;gt;
            &amp;lt;generator class="native"/&amp;gt;
        &amp;lt;/id&amp;gt;
        &amp;lt;set name="tasks" cascade="all" inverse="true" lazy="true"&amp;gt;
            &amp;lt;key/&amp;gt;
            &amp;lt;one-to-many class="Task" /&amp;gt;
        &amp;lt;/set&amp;gt;
    &amp;lt;/class&amp;gt;
&amp;lt;/hibernate-mapping&amp;gt;&lt;/pre&gt;

What is tuplizer? It is used by hibernate to replace your set with hibernate enhanced set. So, I wrote my own tuplizer, and replace your set with my enhanced set.

&lt;pre class="brush: java"&gt;
public class RichEntityTuplizer extends PojoEntityTuplizer {
    public RichEntityTuplizer(EntityMetamodel entityMetamodel, PersistentClass mappedEntity) {
        super(entityMetamodel, mappedEntity);
    }

    protected Setter buildPropertySetter(final Property mappedProperty, PersistentClass mappedEntity) {
        final Setter setter = super.buildPropertySetter(mappedProperty, mappedEntity);
        if (!(mappedProperty.getValue() instanceof org.hibernate.mapping.Set)) {
            return setter;
        }
        return new Setter() {
            public void set(Object target, Object value, SessionFactoryImplementor factory) throws HibernateException {
                Object wrappedValue = value;
                if (value instanceof Set) {
                    HibernateRepository repository = new HibernateRepository();
                    repository.setSessionFactory(factory);
                    wrappedValue = new HibernateRichSet((Set) value, repository, getCriteria(mappedProperty, target));
                }
                setter.set(target, wrappedValue, factory);
            }

            public String getMethodName() {
                return setter.getMethodName();
            }

            public Method getMethod() {
                return setter.getMethod();
            }
        };
    }
}&lt;/pre&gt;

In short, the code means:

&lt;pre class="brush: java"&gt;
employee.tasks = new HibernateRichSet&amp;lt;Task&amp;gt;(...)
&lt;/pre&gt;

This version of RichSet is much smarter. It will translate your find statements from

&lt;pre class="brush: java"&gt;
tasks.find("startTime").le(new Date()).find("endTime").isNull(); 
&lt;/pre&gt;

---&gt;

&lt;pre class="brush: java"&gt;
DetachedCriteria.forClass(..).add(...).add(...)
&lt;/pre&gt;

Now, in the domain, you can query against your collection without worrying about how the query will be done. Domain is still pure, no dependency on DAO. Domain is still all InMemory, no need to start up your container, your database to test domain logic.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-6457019734397855987?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/6457019734397855987/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=6457019734397855987' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/6457019734397855987'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/6457019734397855987'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2008/05/anemic-domain-model.html' title='Anemic Domain Model'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-6475242285143982122</id><published>2008-01-16T16:36:00.002+08:00</published><updated>2009-06-16T16:10:55.525+08:00</updated><title type='text'>Pain Points of Using XAML or WPF</title><content type='html'>&lt;h3&gt;Pain Point 1: XAML always create the controls by its default constructor&lt;/h3&gt;

This means, you need to have a default constructor for you control, and the constructor will always be used by XAML. So, you can not use constructor dependency injection to pass things like services, gateways to your control. Also, you will not have chance to pass data in constructor, although the data might be must-have for the specific type of control

&lt;h3&gt;Paint Point 2: can not control XAML to create or not to create some part of GUI&lt;/h3&gt;

Sometimes, the GUI is not static. It could be dynamic because the GUI would be different for the data it is presenting, such as for a meeting in the past it should show a adding note button, for a meeting in the future it should not. And more often, the security control requires the GUI to be different according to the role.

&lt;h3&gt;Paint Point 3: XAML is using XML, which contains too many visual noise&lt;/h3&gt;

compared to things like YAML, XML is definitely not very friendly to our eyes. The things worse than XML I can come up is the braces of Lisp. Also, XML makes it harder to edit manually

&lt;h3&gt;Paint Point 4: Layouting in Grid&lt;/h3&gt;

Using grid layout currently requires you to specify the row and column for all the children of a grid. It is very error-prone when the grid becomes large. But grid is a must-have for any non-trivial GUI, and there is not replacement for it yet.

&lt;h3&gt;Paint Point 5: Things not checked in compiling time&lt;/h3&gt;

There are lots of things not checked by the compiler in XAML. Things like binding, resource looking up for example. And it is harder to cross reference between xaml and code.

&lt;h3&gt;Paint Point 6: More files&lt;/h3&gt;

one file for xaml one file for cs. It requires more steps to create a new user control and is confusing to new comers.

&lt;h3&gt;Paint Point 7: Separating concerns&lt;/h3&gt;

the default way events get handled is in the partial class of the XAML. It is not a good way of separating concerns and not good oo design. the windows and user controls usually doing too much in rich client application. It is not the fault of XAML in general, but it is not promoting a good model either by its weird way of hooking up event in xaml.

&lt;h3&gt;Paint Point 8: Hard to test&lt;/h3&gt;

It is hard to test in many ways. First, not easy to inject dependency means you can not mock those expensive things like network connection. Second, creating a real window is taking more than ten seconds. Third, many things are in a static singleton model like resource looking up and the single instance application object.

&lt;h3&gt;Paint Point 9: Control lazy created with uncertain lifecycle&lt;/h3&gt;

controls of list item in a list view were lazy created. We can not get those controls easily, and we even can not be sure they are created or not. only those who are visible will be created.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-6475242285143982122?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/6475242285143982122/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=6475242285143982122' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/6475242285143982122'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/6475242285143982122'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2008/01/pain-points-of-using-xaml-or-wpf.html' title='Pain Points of Using XAML or WPF'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-3324939558801686830</id><published>2007-05-11T13:12:00.001+08:00</published><updated>2009-06-16T16:05:45.589+08:00</updated><title type='text'>Why Do We Need Mock Framework?</title><content type='html'>Given we have a simple behavior to test: 
a form
a text field on the form
a button on the form
click the button, should set the text of the text field to "Hello"

And, here is the code implementing the behavior, MVP pattern is applied here:

&lt;pre class="brush: java"&gt;public interface View {
&amp;nbsp;&amp;nbsp;public void setText(String text);
&amp;nbsp;&amp;nbsp;public void addActionListener(ActionListener actionListener);
}
&lt;/pre&gt;
&lt;pre class="brush: java"&gt;public class Presenter {
&amp;nbsp;&amp;nbsp;public Presenter(final View view) {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;view.addActionListener(new ActionListener() {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;public void actionPerformed() {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;view.setText("Hello");
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;});
&amp;nbsp;&amp;nbsp;}
}
&lt;/pre&gt;

Before writing the test in java, let's first write in pseudo-code:

&lt;pre class="brush: java"&gt;
create mock view
create presenter by mock view
fire event on mock view
assert text is set
&lt;/pre&gt;

Then, let's implement it using latest jMock:

&lt;pre class="brush: java"&gt;
@Test
public void test_click_button_should_set_text_hello() {
&amp;nbsp;&amp;nbsp;Mockery mockery = new Mockery();
&amp;nbsp;&amp;nbsp;final View mockView = mockery.mock(View.class);
&amp;nbsp;&amp;nbsp;final ActionListenerMatcher actionListenerMatcher = new ActionListenerMatcher();
&amp;nbsp;&amp;nbsp;mockery.checking(new Expectations() {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;one(mockView).addActionListener(with(actionListenerMatcher));
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;one(mockView).setText("Hello");
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}
&amp;nbsp;&amp;nbsp;});
&amp;nbsp;&amp;nbsp;new Presenter(mockView);
&amp;nbsp;&amp;nbsp;actionListenerMatcher.fireActionPerformed();
&amp;nbsp;&amp;nbsp;mockery.assertIsSatisfied();
}
&lt;/pre&gt;

Here, we introduced a custom matcher, called ActionListenerMatcher. The reason why we need this, is because we need a way to fire the event. Without the matcher, we have no place to store the listener passed in. Here is the implementation of ActionListenerMatcher:

&lt;pre class="brush: java"&gt;
public class ActionListenerMatcher extends BaseMatcher&lt;ActionListener&gt; {
&amp;nbsp;&amp;nbsp;private ActionListener actionListener;
&amp;nbsp;&amp;nbsp;public boolean matches(Object item) {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;actionListener = (ActionListener) item;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return true;
&amp;nbsp;&amp;nbsp;}
&amp;nbsp;&amp;nbsp;public void fireActionPerformed() {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;actionListener.actionPerformed();
&amp;nbsp;&amp;nbsp;}
&amp;nbsp;&amp;nbsp;public void describeTo(Description description) {
&amp;nbsp;&amp;nbsp;}
}
&lt;/pre&gt;

What is the conclusion? The intention of developer when writing the test is lost in the long and complex mocking code. How about other frameworks? I have tried EasyMock as well, which is even worse than jMock. Do we have a simpler way? Yes, we have. Check this out:

&lt;pre class="brush: java"&gt;
@Test
public void test_click_button_should_set_text_hello() {
&amp;nbsp;&amp;nbsp;MockView mockView = new MockView();
&amp;nbsp;&amp;nbsp;new Presenter(mockView);
&amp;nbsp;&amp;nbsp;mockView.fireActionPerformed();
&amp;nbsp;&amp;nbsp;Assert.assertEquals("Hello", mockView.getText());
}
&lt;/pre&gt;

Isn't this simple? MockView is just a simple implementation of View:

&lt;pre class="brush: java"&gt;
private class MockView implements View {
&amp;nbsp;&amp;nbsp;private ActionListener actionListener;
&amp;nbsp;&amp;nbsp;private String text;
&amp;nbsp;&amp;nbsp;public void addActionListener(ActionListener actionListener) {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;this.actionListener = actionListener;
&amp;nbsp;&amp;nbsp;}
&amp;nbsp;&amp;nbsp;public void setText(String text) {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;this.text = text;
&amp;nbsp;&amp;nbsp;}
&amp;nbsp;&amp;nbsp;public String getText() {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return text;
&amp;nbsp;&amp;nbsp;}
&amp;nbsp;&amp;nbsp;public void fireActionPerformed() {
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;actionListener.actionPerformed();
&amp;nbsp;&amp;nbsp;}
}
&lt;/pre&gt;

So, before you starting to use a mock framework. Think about it, do we really need them?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-3324939558801686830?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/3324939558801686830/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=3324939558801686830' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/3324939558801686830'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/3324939558801686830'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2007/05/why-do-we-need-mock-framework.html' title='Why Do We Need Mock Framework?'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-115519501614298544</id><published>2006-08-10T15:08:00.002+08:00</published><updated>2009-06-16T16:16:29.721+08:00</updated><title type='text'>Async Unit Tesing in ActionScript3</title><content type='html'>This is a cutting-edge topic, if you are not interested in programming in next generation flash application, please ignore me:)
The most mature unit testing tool available in ActionScript3/Flex2 world is FlexUnit2, which is a oss project hosted at labs.adobe.com. It is a simulation of JUnit, from framework design to usage. Nothing changed, except Thread.sleep is missing in Flash world. If we can not wait for a few seconds, how to test async behavior?
To solve this problem, FlexUnit introduce a method in class TestCase, called "addAsync". It takes minimal two parameters. You can use it like this:
&lt;pre class="brush: java"&gt;
  loader.addEventListener(Event.COMPLETE, addAsync(onIndexPageLoaded, 1000));
&lt;/pre&gt;
The value returned by addAsync is a function wrapping your event handler. To look up full documentation about this method, see &lt;a href="http://weblogs.macromedia.com/as_libraries/docs/flexunit/flexunit/framework/TestCase.html"&gt;here&lt;/a&gt;. 
After adding a "Async", FlexUnit will wait for few seconds to finish testing one test method. But here are some findings and tips for you:
&lt;ol&gt;
&lt;li&gt;FlexUnit is using "Timer" to wait for finishing. It will not pause execution actually, but will check for the result later. i.e:
&lt;pre class="brush: java"&gt;
  loader = new URLLoader();
  loader.addEventListener(Event.COMPLETE, addAsync(onIndexPageLoaded, 1000));
  loader.load(new URLRequest("twspike-index.html"));
  doSomething();
&lt;/pre&gt;
doSomething will be executed immediately after loader.load(...).
&lt;/li&gt;
&lt;li&gt;You can not use "addAsync" in setUp. Because setUp and testXXX is two different test cases, so FlexUnit will wait for setUp to finish instead of waiting for your actual testing code to finish. Currently, the error reported is quite mistery.
&lt;/li&gt;
&lt;li&gt;How to actually wait for something happened than start testing? Here is the home made HOW-TO:
&lt;pre class="brush: java"&gt;
private function onIndexPageLoaded(event:Event):void {
  parser = new IndexPageParser(loader.data);
  checker.call(this);
}
private function check(checker:Function):void {
  this.checker = checker;
  loader = new URLLoader();
  loader.addEventListener(Event.COMPLETE, addAsync(onIndexPageLoaded, 1000));
  loader.load(new URLRequest("twspike-index.html"));
}
public function blog_data_should_not_be_null():void {
  check(function():void {
    assertNotNull(parser.blogData);
  });
}
public function blog_data_should_be_valid_xml():void {
  check(function():void {
    XML(parser.blogData);
  });
}
&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
To summary it up: Wrapping your testing code in a function, Passing it to a check method, saving the testing code and start to execute the async action. In the event handler, call the saved testing code. It is annoying, I know...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-115519501614298544?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/115519501614298544/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=115519501614298544' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/115519501614298544'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/115519501614298544'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/08/async-unit-tesing-in-actionscript3.html' title='Async Unit Tesing in ActionScript3'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-115163612798983971</id><published>2006-06-30T09:40:00.000+08:00</published><updated>2006-06-30T10:55:28.076+08:00</updated><title type='text'>Simple Design: "DSL" Reloaded</title><content type='html'>I have been silent about DSL for a while. Now, I am back:)
After being thinking for several months, I realized most of time, people don't need domain specific language, they just need the code read more nicely. Then, I come up a idea such kind of requirement doesn't need to involve heavy implementation such as grammar, parser or compiler, it could be &lt;font color="red"&gt;simple&lt;/font&gt;, and it should be &lt;font color="red"&gt;simple&lt;/font&gt;!
So, is Ruby or Smalltalk the right selection? I have to say, I don't think so. The reason why I am not very keen about the idea using Ruby as the environment to embed so called "DSL"( I still refer to the nicely looking code as DSL, sorry about that), is because the language is not invented to support hosting DSL. Method missing or closure or initial block are not intended to use this way:
&lt;pre&gt;
publishing agreement dated '9/20/2005'
with_author 'Joe W. Author', social('555-493-3920')
for_title 'DSLs for Dummies'

report do
  calculate 'Royalties', as net_retail_sales.during(last_six_months) * 20.percent
end
&lt;/pre&gt;
&lt;font color="red"&gt;We are using Ruby too tricky&lt;/font&gt;!!! We are not using it, we are hacking it. The side effect is understanding the inner mechanism behind nice code becoming harder and harder, which is leading us to a dangerous direction. That reminds me of similar experience of C++. After introducing template into C++, I think except STL and several other excellent framework addressing some critical issue (mostly performance), others are simply too smart to be useful. Tons of frameworks inside Boost are just trying to make the code looking nicer...
My point is if the language did not support the way of writting code we want, don't hack it using powerful trick to hack it even if the father of the language encourage you to do so. (I don't know the attitude of Matsumoto, but I do know Bjarne speaks a lot about extending C++ using framework).
If not hacking a flexible scripting language, what can we choose to implement the so called DSL? The one thing I am sure if we need to write complex grammar for a new DSL (actually a English-like language), we are going the wrong direction. Because human language is too complex to be handled by formal grammar specification. So, I like the philosophy behind embeded DSL(Ruby again...or Smalltalk). The DSL is still embeded in a GPL, but the GPL should support hosting DSL so the implementation doesn't need to be tricky.
The initial idea came into my mind back to this Feb. But at that time, I thought what we need is a new lightweighted GPL, but it still need to be weak typed, mordern featured scripting language just like Python, Ruby. Part of the reason is I had another nice idea about how to implement a weak typed scripting language effciently in JVM, but I didn't have passion to carry it into reality. This lead me to a not-that-simple-design... 
Sadly or luckily... I have to say, after several months, I realized it should be more &lt;font color="red"&gt;simple&lt;/font&gt; than a new scripting language.
Then, what is the simplest desing? How about this:
assertThat(characterSet, contains('a'));
compared with
assert_$1$_contains_$2$(characterSet, 'a');
then fommatted to:
assert characterSet contains a
what we need is a Eclipse plugin to write and read java file in a different view.
further more we can that Eclipse displaying inner class like a closure. and $this$ could aslo be a part of the method name to support:
list.add(item);
list.add_$1$_to_$this$(item);
add item to list&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-115163612798983971?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/115163612798983971/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=115163612798983971' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/115163612798983971'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/115163612798983971'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/06/simple-design-dsl-reloaded.html' title='Simple Design: &quot;DSL&quot; Reloaded'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-114845906967349874</id><published>2006-05-24T16:03:00.002+08:00</published><updated>2009-06-16T16:18:38.750+08:00</updated><title type='text'>Integrating Selenium With Build Script</title><content type='html'>[Information Below is about Selenium-Core only]
When I was at TWU, michael used three days to put selenium tests into build script. I even don't know the cc.net is actually running the selenium tests finally. From that time, I know, this is not a easy problem.
Today it is my first time trying to setup a cc running selenium tests. It costs nearly a whole day to fight with resin and jdk bugs. Fortunately, I won the game at last:) Here is some tips I want to share:
Steps:
&lt;ol&gt;
&lt;li&gt;Start server&lt;/li&gt;
&lt;li&gt;Run Tests&lt;/li&gt;
&lt;li&gt;Get Result&lt;/li&gt;
&lt;li&gt;Stop server&lt;/li&gt;
&lt;/ol&gt;
Key Problem is "How to get the result". Selenium will call a url "postResult" After all tests were finished. The official solution to catch the result is writing a servlet. But here are two problems need to be taken into consideration:
1. When to stop the server?
2. How to know tests passed?
To solve these two problems, the servlet need to "provide" information about the testing progress and result. So one side, the servlet is a result recevier from selenium; on the other side, it is the selenium testing information provider for build script. Here is my code:
&lt;pre class="brush: java"&gt;  
public class SeleniumResultServelet extends HttpServlet {
 
 private String result = null;
 
 protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
  OutputStream outputStream = response.getOutputStream();
  if (result == null) {
   outputStream.write("pending".getBytes());
  } else {
   outputStream.write(result.getBytes());
  }
  outputStream.write("\r\n".getBytes());
 }

 public void doPost(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
  result = request.getParameter("result");
 }

}
&lt;/pre&gt;
ps: selenium use "POST" to access postResult URL, build script use "GET" to access postResult URL.

Inside the building script, there is a loop:
&lt;pre class="brush: java"&gt;
while(true) {
    Thread.sleep(500);
    URL url = new URL(postResultURL);
    InputStream inputStream = url.openStream();
    BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
    String status = reader.readLine().trim();
    reader.close();
    if ("failed".equals(status)) {
     throw new RuntimeException("Selenium Test Failed");
    }
    if (!"pending".equals(status)) {
     break;
    }
   }
&lt;/pre&gt;
Above it is my premature way to integrate selenium...

ps: the reason why I write this part of build script in Java is not only because I need to integrate selenium, but mainly because Resin can not be stopped by ant under windows (I tried windows service, but failed). So please don't blame me about that...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-114845906967349874?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/114845906967349874/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=114845906967349874' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114845906967349874'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114845906967349874'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/05/integrating-selenium-with-build-script.html' title='Integrating Selenium With Build Script'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-114422946297827556</id><published>2006-04-05T17:15:00.000+08:00</published><updated>2006-04-05T17:31:03.013+08:00</updated><title type='text'>Office could be a perfect platform for DSL development</title><content type='html'>I am thinking a lot about DSL recently. And I found we are doing quick start using Powerpoint. I suddently realized Office is a perfect platform to involve business guy into software development. After some investigation and research, I found Office is not only a user-firendly platform, but also quite extensible for 3-party, especially the future version 2007. Detail implementing technology still needs more researching effort, but I am sure it will be flexible enough to support hosting a DSL development environment.
My vision is:
Word is the code editing place. You can write, edit, run and debug DSL code in it, including inventing new DSL and using exisiting DSL.
Excel is the testing place. Anyone used Fitness will find it is suitable being implemented in Excel.
Powerpoint is the GUI designing place, not only single page, but also the flow between pages.
In the large scale. Office is a platform used for authoring document. Since DSL is some kinda executable document, there is no reason not utilizing the de facto document editing platform - Office.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-114422946297827556?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/114422946297827556/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=114422946297827556' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114422946297827556'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114422946297827556'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/04/office-could-be-perfect-platform-for.html' title='Office could be a perfect platform for DSL development'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-114236499545763264</id><published>2006-03-15T03:34:00.000+08:00</published><updated>2006-03-15T11:23:23.230+08:00</updated><title type='text'>Investigation on DSL</title><content type='html'>Choose one most important/applicable to you from each level:

Level 1
Producativity
Correctness
Flexibility

Level 2
Involving Non-Programmer
Reuse

Level 3
Higher-Level Abstraction
Readabiliy

Level 4
Non-Textual Source Code(Advertised by Intentional Programming, MPS) 
Meta-Programming
Visualization(A Step Further from Non-Textual Source Code, Using Table, Diagram...)
Good Looking Code Producing Syntax(Method Missing, Keyword Message, Lambda/Blocks/Closure...)

Each level represents a goal people seeking in DSL research. Higher level goal is supported by lower level goal. I want to know, in each level, which goal matters most in Business Solution Development. Really thank you for participating in this investigation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-114236499545763264?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/114236499545763264/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=114236499545763264' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114236499545763264'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114236499545763264'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/03/investigation-on-dsl.html' title='Investigation on DSL'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-114165423774499771</id><published>2006-03-06T21:58:00.000+08:00</published><updated>2006-03-06T22:10:38.336+08:00</updated><title type='text'>Thoughts on Implementing Dynamic Typed Object in JVM</title><content type='html'>Java is a static typed language. JVM is a static typed virtual machine. How to implement dynamic typed object in JVM?
I just got an idea: generate a interface for each method. When you call a method, you first cast the object to the interface which supports the method to call. We might end up needing to implement hundred interface for a object.
Yep, that is it. Using interface to get workround for static typing limit. The side-effect might be:
hard to implement Mixin
hard to implement method missing
classloader will go crazy...
But I think, this way will have a better performance and interop experience, because we are actually using java object model instead of making a new object model using map.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-114165423774499771?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/114165423774499771/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=114165423774499771' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114165423774499771'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114165423774499771'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/03/thoughts-on-implementing-dynamic-typed.html' title='Thoughts on Implementing Dynamic Typed Object in JVM'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-114140689011222791</id><published>2006-03-04T01:26:00.000+08:00</published><updated>2006-03-05T02:06:11.706+08:00</updated><title type='text'>Suggesting a new Language for Inventing Domain Specific Language</title><content type='html'>Inspired by Roy's recent speech given to TWU&amp;TWI and Vincent's nice introduction to Smalltalk, I am thinking about the possibility of inventing a language for creating DSL more easily. Inventing a DSL (External DSL in Martin’s definition) is not an easy job. We have to write the grammar in EBNF, and write compiler or interpreter for it. It takes a long time and major efforts to see it works. So, the decision of whether to make a DSL or not very carefully, it may not worth the efforts. And after it is born, changing the syntax or grammar is another big problem.

Thinking about the process of making a DSL, we can find we were given too many options there. Writing a compiler from scratch, we have thousands of choices of the grammar, syntax, semantic of the language. Do we need so many flexibilities, and leaving out so many basic and nice language elements (like OOP or GC)? 

Why we need a new language to cover specific domain? Maybe it can be categorized into two reasons:
1. Improving Productivity (Better Encapsulation, and Reuse)
2. Improving Expressiveness (We can communicate with the clients in a better way)

For the first reason, I don’t think DSL can be very useful in this case. The art of organization of codes is a hot researching area. But I didn’t see any new technology improved the productivity a lot recently after the birth of OOP. I believe OOP is a reliable technology to build large business application in a long time, especially complemented with Agile. For the second reason, I found without inventing a new language from scratch, we still can achieve the same goal, that is build DSL on top of an extremely flexible language (Internal DSL in Martin’s Definition).

Assuming the main reason to invent a DSL is Improving Expressiveness. What struck the expressiveness of code? I think it is because:
1. The concepts employed in code didn’t fit very well in real world.
2. The grammar or syntax of programming language makes the code looks weird.

For the first problem, we can solve it using current technologies such as Object Oriented Programming, Domain Driven Design…
For the second problem, we have few choices when we are using languages like Java or C#. If using Ruby, because language provided a bunch of nice features, we are able to do some clever design to make the code looks better, but we are still so limited. If using Lisp or Smalltalk, we are given maximum flexibility to do nice things. 

The features of Smalltalk which makes its code looks so natural are:
1. Minimal built-in keywords, few things are special (Even if/else, while are implemented using OOP or Recursion)
2. More straightforward grammar (No commas, braces, curly braces)
3. Key Message

Given if/else as an example, “ifTrue” is just a message of class True and class False. The difference between the two implementations is one is executing the following block, one just ignore it. So we can introduce new control structures and other things used to be implemented at level of language easily.

Key Message is another cool feature. In Java, we can only write:
text.addAttributes(attr, start, stop). 
But In Smalltalk, we can given the message addAttribute a better name involving the information of parameters in it:
text addAttribute: attr from: start to: stop
the signature of the message consists three parts: addAttribute from to. So the code will read more like English.

But Smalltalk is not good enough:
1. Still have some symbols for grammar like [:] [^]. 
2. The arrangement sentence element is limited by order “object message”. 
3. Can not use space in naming.
4. Can not involve left side of = into sentence nicely (we can not write Create new person, assign it to Michael). 

So I am suggesting a new language with creating new DSL more easily as its only purpose. It starts from good job done by Smalltalk, and improve it further by solving above problems. The final target is allowing the code reads like English, although writing it may still need much more careful design naming and coding comparing with writing casually in English. The initial thoughts are listed below:
1. Use XML to structure the source code (actually is the abstract syntax tree), programme against GUI representation of the source code instead of writing in text file directly. (So I don’t need to invent any fancy grammar to keep the balance of exactness and expressiveness. Let the XML source file to handle the problem of exactness by documenting parse tree, while maintaining high expressiveness through IDE)
2. Decouple order from invocation. (By introducing the above technique, now we can specify message sent to which object through underlying XML representation. Then it is not necessary to force object followed by the message sent to it. Finally we can say “update window” instead of “window update”)
3. Out parameter. (x = y calcSomething; becomes y calcSomething assign to x. Pass the variable you want to store the return value as special message parameter to object. Then the semantic of “=” can be shown).
4. Then message becomes the skeleton of sentence, we can fill the object we want to operate on, the argument of the operation, and the result variable into the skeleton to form a part of sentence. (move … to … , … is a message, we can say “move pointA to 1 , 2”. The object we are operating on is pointA, the arguments are 1 and 2)

I think creating DSL this way is the most economical way. Either writing new compiler or trying to use MDA to save efforts spent on writing compiler will cost a lot of time money and rework. But building DSL upon a flexible OOP language, after some interfacing job, DSL is just a natural extending to the Domain Model we have done today. 

BTW: Is “SmallRocks!” a nice name? :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-114140689011222791?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/114140689011222791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=114140689011222791' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114140689011222791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/114140689011222791'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/03/suggesting-new-language-for-inventing.html' title='Suggesting a new Language for Inventing Domain Specific Language'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-113932406182009461</id><published>2006-02-07T22:48:00.000+08:00</published><updated>2006-02-07T22:54:21.870+08:00</updated><title type='text'>Why object matters in agile development?</title><content type='html'>I ask this question to myself and to my colleagues here(Xi'an). I got a pretty cool answer from vincent and seemed to be correct. "Because using object technology, we can do local change easier than old-time procedure based method" I totally agree with him. Agile is all about embracing changes. We have to refactor all the time. If we code in the procedure based way, we found it is very hard to change a function without other functions affected. But if we utilize object technology, we have design patterns and other experiences to decouple objects. Then we are more likely to refactor more, and more effectively. I will ask the same question at TWU, and wanna here different ideas~&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-113932406182009461?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/113932406182009461/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=113932406182009461' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113932406182009461'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113932406182009461'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/02/why-object-matters-in-agile.html' title='Why object matters in agile development?'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-113927687892144841</id><published>2006-02-07T09:30:00.000+08:00</published><updated>2006-02-07T09:47:58.946+08:00</updated><title type='text'>Not very agree with the book "object technology"</title><content type='html'>I am reading the textbook we will use at the TWU. One of them is "object technology: a manager's guide". It is a great book trying to explain why object matters to bussiness issue. But I am not very agree with the book after reading the first chapter. So I will keep updating my view as moving to the rest. 
The reason I am not convinced by the book is that I don't think objects in programming should be told to they are just like the objects in the nature. According to my own experience, they are quite different. In my opinion, the objects in the programming world are the projection of the objects in the real world bounded in the context of the problem. If we treat different objects the same, we are tending to make the one object really complex to address all the possible roles they are possible to play in the nature, which is not the exactly way in which we programme. Instead, we should set the problem context first, then analyse the objects inside the context, and then design objects to reflect the real world. So, I think "object technology" is somewhat not that right.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-113927687892144841?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/113927687892144841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=113927687892144841' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113927687892144841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113927687892144841'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/02/not-very-agree-with-book-object.html' title='Not very agree with the book &quot;object technology&quot;'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-113915560658539328</id><published>2006-02-05T23:57:00.000+08:00</published><updated>2006-02-06T00:06:46.596+08:00</updated><title type='text'>Textbook of Boot Camp</title><content type='html'>I will attend the boot camp held in Bangalore this February. Today, the day 1 at ThoughtWorks, I received the textbook. Here is the list of the picked books. Maybe it is helpful for you to know more about how agile works inside TW.

1. Object Technology: A Manager's Guide
2. XP Explained 2nd Edition
3. Prgramatic Programmer

It covers all the basic technology a junior developer needs to know. I will read or re-read all the three books thoroughly. It must be a pleasant journey.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-113915560658539328?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/113915560658539328/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=113915560658539328' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113915560658539328'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113915560658539328'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2006/02/textbook-of-boot-camp.html' title='Textbook of Boot Camp'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-113508004893087827</id><published>2005-12-20T19:42:00.000+08:00</published><updated>2006-02-06T22:29:01.260+08:00</updated><title type='text'>π-calculus</title><content type='html'>I just found this interesting research area. From the name, we can see it may have something in common with λ-calculus. Indeed, as λ-calculus is semantic foundation the functional programming, π-calculus is the semantic foundation of concurrent programming. I am very new to this area, so I nearly know nothing about it. But I do think it is very very important. So Here is some reference:
&lt;a href="/repository/2005/12/002.pdf"&gt;The Polyadic π-Calculus: a Tutorial&lt;/a&gt;
&lt;a href="/repository/2005/12/003.pdf"&gt;Pict: A Programming Language based on the π-Calculus&lt;/a&gt;
&lt;a href="/repository/2005/12/004.pdf"&gt;The Polymorphic π-Calculus: Theory and Implementation&lt;/a&gt;
&lt;a href="http://www.fairdene.com/picalculus/index.html"&gt;A page about the pi calculus (and Business Process Management)&lt;/a&gt;
&lt;a href="http://www.sigmod.org/dblp//db/indices/a-tree/m/Milner:Robin.html"&gt;List of publications by Robin Milner&lt;/a&gt;
&lt;a href="http://www.cl.cam.ac.uk/users/rm135/"&gt;Robin Milner's Home Page&lt;/a&gt;
&lt;a href="http://homepages.cwi.nl/~arie/picalc.html"&gt;Pi-Calculus Links&lt;/a&gt;
&lt;a href="http://lamp.epfl.ch/mobility/"&gt;Calculi for Mobile Processes&lt;/a&gt;

Feb 6th 2006:
I read and heard a lot about pi recently. And a friend decided to include topics about comparing pi and petri in her master paper. But someone said, petri is already providing all workflow application needs. If you don't need the mobility, petri is actually enough. I don't know if it is true. I will keep tracking it if I have time and interest.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-113508004893087827?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/113508004893087827/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=113508004893087827' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113508004893087827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113508004893087827'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2005/12/calculus.html' title='π-calculus'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-113465325862134121</id><published>2005-12-15T20:54:00.000+08:00</published><updated>2005-12-15T21:34:18.220+08:00</updated><title type='text'>Duck Typing</title><content type='html'>Yesterday, I talked about the Dynamic Overloading. Today, let's focus on the duck typing with by the way is what OO should be in classical way.
First of all, the philosophy again: If it walks like a duck and quacks like a duck, it must be a duck. Actually, the real philosophy behind, I have to say is Ontology. That is what a object really is, is depending on what kind of operations it supports. Here the kind I mean the semantic of the operation.
Let's compare the static strong typing against duck typing:
public void func(someType a) {
a.someField...
}
def func(a)
  a.someField
The two are pretty the same, except in static case, when you call func with a, the compiler will check the type of a to see if it is of type someType, and to check if the type someType has a field someField. But in static case, only the type if has the field someField is checked, and it is been checked in runtime by the runtime envrionment rather than the compiler. The tiny difference between the two, to check or not to check the type, makes the duck typing not only more handy but more realistic.
Why I am saying this? The biggest reason is the duck typing more focused on the semantic rather than the stupid static typing which assumes everyone can be the god to judge everything's type. When you call a method named "transfer" against a object, you knows what the transfer should be. There is no static type to gurantee what function you are actually calling. You think about the semantic here, rather than guessing the behavior of the codes already exists.
So the name of the field and method in the duck typing is very very important. Because it is not only a name but also says something about what it about. It replaces the position of "class" or "type" in static typing language. The name itself reprents all your intention instead of both type and name in static typing case(i.e. you have to use the right type with the right name, you can get right you want). So, the burden of the "name" becomes heavier than the "name" used to be. In the past, the "name" is rather trivial, you have to give the type first, but as the type given, you can not mess the name up. But the sad thing is, the "name" in duck typing is too important to carry the burden. The problem we are struggling with like Refactoring especially "rename" is one of them. It all comes from the possiblity when we say I need a field named "xxx" that we are actually not pointing to the same thing. It can be a big problem, I don't Unit Test can hide cover all the defects, although I do believe Unit Test is much more important than static typing.
This is only a begining, I will go further tomorrow. see you&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-113465325862134121?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/113465325862134121/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=113465325862134121' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113465325862134121'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113465325862134121'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2005/12/duck-typing.html' title='Duck Typing'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-113455325101504056</id><published>2005-12-14T17:23:00.000+08:00</published><updated>2005-12-14T20:41:55.776+08:00</updated><title type='text'>Dynamic Overloading</title><content type='html'>Haven't heard of dynamic overloading? Yes, I made it up:)
It is not only another way to support polymorphism, but it is also a revolution to the Object-Oriented Programming. It brings the functions and the semantic of the objects back to the center of stage to replace the position of "class" and "inheritance".
First of all, let's look at static overloading.
&lt;pre&gt;public class Sample {
 public static void main(String[] args) {
  new Sample().demo();
 }
 public void demo() {
  func(new A());
  func(new B());
 }
 public void func(A arg) {
  System.out.println(arg);
 }
 public void func(B arg) {
  System.out.println(arg);
 }
 class A {
  public String toString() {
   return "hello from A";
  }
 }
 class B {
  public String toString() {
   return "hello from B";
  }
 }
}&lt;/pre&gt;
Because the two "func" have different signature i.e. the "arg" has different type defined by different "class", the compiler can resolve the invokation to the concrete function, so we call it static.
Then, let's look at overriding(Maybe this is an inexact word).
&lt;pre&gt;public class Sample {
 public static void main(String[] args) {
  new Sample().demo();
 }
 public void demo() {
  A a = new B();
  a.func();
 }
 class A {
  public void func() {
   System.out.println("hello from A");
  }
 }
 class B extends A {
  public void func() {
   System.out.println("hello from B");
  }
 }
}&lt;/pre&gt;
Then, let's check the duck typing case(Written in Ruby):
&lt;pre&gt;class A
 def func()
  puts "hello from A"
 end
end

class B &lt; A
 def func()
  puts "hello from B"
 end
end

obj = B.new()
obj.func()&lt;/pre&gt;
Oops~ It is duck typing, we don't need "class B &lt; A" actually. As long as the object has a slot named "func", it is ok, the hindrance set by the static type system has disappeared.
&lt;pre&gt;class A
 def func()
  puts "hello from A"
 end
end

class B
 def func()
  puts "hello from B"
 end
end

def demo(obj)
 obj.func()
end

demo(B.new())
demo(A.new())&lt;/pre&gt;
If we use Javascript(prototype language), we even don't need to define a class to set the slots of its objects. We can do things like "obj.func = someFunc", then we can call it from the obj. 
Ok, right now, we have seen the static overloading and overriding state of arts. Given it is static, static overloading won't be a reasonable choice to be the main stream polymorphism. And, all the overridings are based on a assumption, we can dynamically get a function pointer from a slot of the object, then call it. Using prototype doesn't need you to define the slot in a class, you can just add it to the object in the runtime, but the object still knows what operation it can apply. If we want to call a method of a object, you has to set the slot before you call it. This limitation is the motivation of inventing dynamic overloading.
Now, let's examine polymorphism, in general. We know overloading and overriding are both ways to be polymorphism. All polymorphism is about is binding a call to different function. Overloading judging it by static signature. Overriding judging it by value of the slot in the object. How about judging it by "the semantic of the object"? I mean every function describes what the object it operates on should be like. Then in the runtime, depending on the actual form of object, we can decide which function to call. As I haven't invented a full system to implement the idea, I can only demonstrate it in pseudo-code:
&lt;pre&gt;//in file func.fluffy
var a;
function func() {
 puts this.a;
}
var b;
function func() {
 puts this.b;
}
//in file createA.fluffy
var a;
function createA() {
 this.a = "hello from A";
 return this;
}
//in file createB.fluffy
var b;
function createB() {
 this.b = "hello from B";
 return this;
}
//in file demo.fluffy
function demo() {
 var obj = createA();
 obj.func();
 obj = createB();
 obj.func();
}&lt;/pre&gt;
I know this example is rather weird, because it is so different~
Let's check it line by line:
var obj = createA();
It invokes createA(). Because we don't call it using a object, so the runtime will create a empty object for you. So the "this" in createA is not null, but a empty object. We declared the object createA() bound to has a field named "a"(which is not only means a member variable named "a" but carries some semantic meaning in real case, such as "frequentAccount" means something in biz), so the empty "this" has a field "a". Then, we return "this". Ok, right now, we just created a object has a field named "a".
obj.func();
Then we try to apply "func" on "obj". This line doesn't try to get a function pointer from the slot "func" from the "obj", instead it actually is func(obj). Then the dynamic overloading happened. We have two "func" in hand, one expects a object with a field named "a", the other expects a object with a field named "b". It is easy to decide which "func" to call. It is the first one, because right now, the "obj" is a object with a field "a".
The following lines are rather similar. We can see which caused we choosing the first "func", that is the semantic of the object, how many fields the object has and the meaning of each field means(shown by the name of the field). Further more, we can introduce a mark to show semantic equivalence.
&lt;pre&gt;//in file createArticle.fluffy
var title;
var author;
var content;
function createArticle() {
 this.author = "wen tao";
 return this;
}
//in file getFirstName.fluffy
@string
@author
var name;
function getFirstName() {
 return this.name.split(" ")[0];
}
//in file demo.fluffy
function demo() {
 createArticle().getFirstName();
}&lt;/pre&gt;
In theory above, the object created by "createArticle" can not be accpeted by getFirstName(), because object(title, author, content) doesn't have a field named "name". But notice this:
@author
var name;
that means "name" can be replaced with "author", but NOT vice versa. Then plus a namespace mechanism, we have a full-blown language which I call "Fluffy" :)
see you, it is long enough. Hope you have grasped the main point.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-113455325101504056?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/113455325101504056/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=113455325101504056' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113455325101504056'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113455325101504056'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2005/12/dynamic-overloading.html' title='Dynamic Overloading'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19829485.post-113448964084319757</id><published>2005-12-11T12:44:00.000+08:00</published><updated>2005-12-19T23:10:01.200+08:00</updated><title type='text'>Different Attitudes Towards Side-Effect</title><content type='html'>Originally posted at: &lt;a href="http://www.bloglines.com/blog/taowen?id=6"&gt;http://www.bloglines.com/blog/taowen?id=6&lt;/a&gt;

Revision:2005/12/19 Modified the way to describe. Added material related to monad.

As a well known fact, one fundamental distinction between FP(Functional Programming) and IP(Imperative Programming) is the way the two treat the side-effect. FP is based on the theory of &lt;a href="en.wikipedia.org/wiki/Lambda_calculus"&gt;lambda calculus&lt;/a&gt; whose η-conversion rejects the existence of side-effect. So when programming in FP language, you were suggested to write the codes in a stateless way. IP actually is built upon the side-effect. Without side-effect, a command computed means nothing. 

But in real world, the FP language can not be "ALL" pure. Because at least the I/O operation involves side-effect. So, every FP language had to give a answer to the question - "How to interact between stateful and stateless world". In old-fashioned FP language, such as scheme, it just allows the exsitence side-effect by magic like "set!". &lt;a href="http://www.haskell.org"&gt;Haskell&lt;/a&gt;, which is much more modern, uses the monad from the category theory to attack this problem. I have to say I haven't understood the theory fully, but it gave me some impression that it is no more than another way to write imperative code, then you can write both style codes in one language, and the typing system guarantee the beautiful functional core won't be messed up by the monadic actions. So, I think the general attitude in FP world is to narrow the side-effect in the very skin of the program, leaving the core full functional.

During a discussion with &lt;a href="http://www.blogjava.net/raimundox"&gt;Vincent Xu&lt;/a&gt;, I realized the fact OOP(object-oriented programming) actually is a try to ease the pain of side-effect in IP. It uses objects to wrap around the side-effect, and expose semantic clear interface to transform the states. As long as the objects handle its state in a consistent way, side-effect won't be that harmful.

Vincent Xu told me in smalltalk, the object is much bigger than it is in Java. In Java, we prefer XMLWriter.write(obj) to obj.writeAsXML(writer). According to his understanding, in smalltalk we should tell instead of ask, i.e. in this case XMLWriter.write(obj), the XMLWriter has to ask the obj for its state, but if we use obj.writeAsXML(writer), the obj is telling the writer its state instead of being asked. So, if we keep all code well object-oriented, i.e. tell instead of ask, we may achieve the goal of OOP by wrapping all the side-effects by objects.

Generally speaking, I prefer OOP over FP. It may majorly because I don't know much about FP. But another important reason is I feel like can't writting all things in a stateless way. Once the program is not pure functional, I am freaked out by the fact. So, the side-effect bothers me a lot which stopped me from working with FP languages as a result.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19829485-113448964084319757?l=taowen.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://taowen.blogspot.com/feeds/113448964084319757/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19829485&amp;postID=113448964084319757' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113448964084319757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19829485/posts/default/113448964084319757'/><link rel='alternate' type='text/html' href='http://taowen.blogspot.com/2005/12/different-attitudes-towards-side.html' title='Different Attitudes Towards Side-Effect'/><author><name>taowen</name><uri>http://www.blogger.com/profile/15207556318302866600</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
