taowen: Data Migration

2/19/2010

Data Migration

For nosql data storage, there is one thing not being talked about very much. That is migration. Typical answer to this is, for nosql database, there is no schema. So, there is no need to migrate the data. But this is a lie, although the database do not need a schema to store data, that does not mean data itself do not have structure.

Nosql claims to be very flexible so that semi-structure data can be stored. But if we use SQLServer as nosql database to persist objects, then the data itself must be structured. If we change the class definition, then the data can not be de-serialized, unless we migrate the data to match the new class definition.

The first typical response, no schema no migration does not work. The second typical response is backward compatible. The idea is, let the application code able to handle the data with older version. This might work, if the data structure (especially the inter-references between the objects) is simple, and the format application code consume is not strict (De-serialization according to class definition is a extremely strict). Also, the more older version need to be compatible, the more complex the application code would be. So, this approach is not pervasive and scalable.

The third response is, add a hook to the persistence layer. Before loading the data for use, check the version of the data. If the version is older, then execute a special transformation to upgrade the data into a newer version, so that the application code can be orthogonal with data migration logic. It seems very optimal, but actually not that easy to implement in practice.

The fourth response is, do a full data migration just like traditional RDBMS data migration. Load all the data out, transform them and save them back.

For object persistence, only approach 3 and 4 would work. And we tried them both, and got lots of surprisingly experiences. In next post, we will look into them in-depth, and see what will work, what work well, and what seems good solution actually costs a lot.

# posted by taowen @ 21:23

Comments:

Hi there,

I think your solutions are valid for a very specific range of problems: serialized objects. That brings your problem closer to the Object Databases and not really to NoSQL solutions that tend to persist data. The separation here between objects and data is quite important as persisting data would basically eliminate the problem you are mentioning.
A whole range of NoSQL solutions are persisting data instead of objects (that's not to say that they would not allow serialized objects). FriendFeed which was using a solution close to the one you've written about was persisting JSON and not serialized objects.
It is probably the key-value stores that might make it too easy to serialize objects instead of data, but I'd say that in all cases you'd be better off with data instead of objects.

bests,

:- alex

MyNoSQL: All things NoSQL

# posted by

Alex Popescu : 00:08

Really.

# posted by

Anonymous : 10:03

taowen

2/19/2010

Data Migration

About Me

Links

Archives