2/21/2010

 

Flattening & Rebuilding

Using SQLServer as your nosql database to persist objects state has issue with data migration. I mentioned three in the previous post. There are two more, today we are going to talk one of them. You can not deserialize object state back to class whoes fields have been changed. Think about a class used to have a field called name, now the field changed to firstName. When deserialize, where should the value of name assigned back to? If we can not get the raw value, how can we apply the data migration logic? We already talked about the data we stored in SQLServer is XML, are we going to parse the XML and manipulate the xml element directly? Yeah, I think we have to.

So the data migration logic is not applied on the same object model which your application logic dealing with. It has to be at a lower level. We could data migrate the xml data elements, but that is just too tightly coupled with the data format. Before we use the xml, we actually tried JSON for a month, until we found the XQuery is really a killer. Also, xml element has many things we do not care about. So, what we need is a model which can capture the states of the objects, but simple enough. This is model is also directly related to how serialization/de-serialization is working. It works like this:

object ==Flatten==> many dictionaries(with dict/list/string inside) ==Serialize==> XML
XML ==Deserialize==> dictionary(load referenced entity state on demand) ==Rebuild==> objects

The XML looks like this:

<Entity CLR_TYPE="Domain.Calendar.Location" Country="China" 
CountryAbbreviation="CN" LId="43-123" TaxUnit="Jiangxi" TaxUnitAbbreviation="JX" />

It will be deserialized to a dictionary containing 5 entries. The CLR_TYPE will be used in the rebuilding process to rebuild the dictionary back to a object. Except dictionary, string is also valid. string is used to store the field name as well as the simple field value. The persistence layer need to define how to translate a date time into a string, and etc. Collection is also valid. Although in theory, collection is just a special case of dictionary.

The XML is just state storage for a single entity. Entities can inter-relate with each other. We are not going to store the state for other entity in same XML. They will be referenced by ID, and stored separated in different rows in the table EntityState.

Because we separated the serialization into two phases, that is why we can do data migration. The data migration is just a function, who take a dictionary as input, and produce another dictionary. Now the only problem is, how can we write such kind of function? Yes, it might be trivial for just one version, but if you are going to change it very frequently doing agile software development, then it is a big issue. We are going to talk about the "reusbility" of data migration rules in the next post.


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]