2/16/2010

 

Use SQLServer as your nosql database

You might be wondering when you are able to use those cool nosql database in your project. But why? You manager might ask. You'd better be prepared. I see nosql database provides two benefits:

  1. Scalability By removing schema, the data entry can be very easily replicated. By removing foreign constraint, the data entry can be replicated without replicating all the entries it references. Then we can build shading around user boundaries.
  2. Productivity The object in the memory and the persisted object state in the database are really the same thing. If we need to do significant mapping between these two models, there must be something wrong, the productivity might hurt as well as the performance. If the two are really the same thing, why we have to store the object state as relational data? Using nosql database, the persistence of objects can be as easy as serialization.

If you think the second one is more likely to attract you and your manager, then SQLServer might be used as nosql database for your project. SQLServer? Yes, actually any RDBMS could be used as nosql database. The problem we are trying to solve is how to persist objects to RDBMS optimizing for developer productivity and optional runtime performance and scalability.

Difficulty of RDBMS object persistence
  1. OR-Mapping constraints the objects design and it is a overhead just like the memory management in C++ programming. It is become very annoying when you refactor often.
  2. Query design. The complex query requires highly skilled SQL writer. Even you can get it running correctly, but might have a problem to make it performant. The ORM solution add another level of complexity to require you specify the loading strategy.
  3. N+1 problem. Loading a deeply nested object graph very often leads to N+1 problem. The so called ripple loading is probably the most often seen performance problem while using ORM.

By using the idea of nosql database, we can can overcome those things and let your developers only focus on using object technology to implement the business requirement. The first thing we need to do to claim being nosql is not "not using sql" but "not using the schema".

No schema

We create a table called "EntityState", with two columns: id, xml. With the id being the id of the entity persisted, and the xml being the content of the entity state. So, the "EntityState" state is essentially a key/value pair database, but with more features actually.

But how can I get the object into xml? Thousands of ways, I have to say. The most important thing is to classify your objects into two categories: Entity or Aggreated. Being entity means it has a id, and every reference to this object should reference by id instead of serialize the content into xml. Being aggregated, means its state will be part of the xml. After we have done this, the circular reference problem might encounter is also prevented.

N+1 Problem

The loading of the object is even more likely to run into the N+1 problem if we do not take it into consideration. Say we load object with id 1, and it reference objects with id 2 and 3. And object with id 2 reference objects with id 4, 5, 6... Then SQL issued for loading a single object can be as many as one thousand. This is obviously a problem. The rough idea is using callbacks or continuation like structure. The detailed solution will be described in the later article.

Index Tables

The complex query you used to write is actually doing two things. It query, of course. Also, it build the model to query on the fly. If the model you persist the object happens having the column to query, the query can be as easy as one line. If the model is very far from the thing you want to query about, then you might need to join several tables and doing some SUM calculation in the SQL. If we can create index tables according to the query might have, then the problem become very trivial. The only problem is ensuring the index table get updated when the "EntityState" table getting updated.

By doing this, your SQLServer database is no long the RDBMS you and your DBA familiar with. It might sounds scary, but it might worth trying if you start to think about NHibernate/Hibernate might not be the best solution. I will write more articles on this addressing:

  1. Avoiding N+1 problem
  2. Avoiding N+1 problem (2)
  3. Data Migration
  4. Data Migration (2)
  5. Flatting & Rebuilding
  6. Data Migration (3)


Comments:
Good article.

RDBMS became something that organizations just got used to.

The major issue when talking about nosql is dealing with how IT guys think data should be handled.

We have for instance the scenario of monolithic databases dealing with data consumed by different softwares.

How would you approach that?
 
This comment has been removed by a blog administrator.
 
This comment has been removed by a blog administrator.
 
If the data is just consumed instead of produced by other software, then we just need to do similar things like the "Index Tables". When "Entity State" table getting updated, make a copy into tables with normal RDBMS schema. So that, DBA can understand the data and it is consumable by other software.
When the performance becoming a concern, we can update tables in a async way.
But if possible, we should try to talk the 3rd party team who builds the software to use things other than the database to do the integration.
 
Hi,

Why not checkout http://www.sisodb.com

//Daniel
 
Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]