2/18/2010
Avoiding N+1 Problem (2)
In the previous post, it talked about how to use batch load to avoid loading references one by one, causing too many queries. First we start with optimizing loading collection. Then we found, all references can be optimized in the same way. But for the indirect references, it seems like very tedious to optimize.
How can we make things batchable into a batch without making the code looking like a mess? For example, object A reference object B, object C. And object B reference D, E. And object C refernece F, G. How can we loading D, E, F, G in one batch? This was not a issue if we use tranditional ORM, because the schema are different for B, C, so the SQL will be different, there is no way to do such kind of batch loading. But now because all entities are stored in the same schema in EntityState table, it is logically possible to do this optimization.
The difficulty is not about loading or batching the entities. The loading is just the same SQL. The differences between loading D, E, F, G is the post processing. For different object need to be loaded and then assigned to different fields. So it is essential to know what the post processings are. A ideal way to do this in C# is:
IDictionary<Guid, Action<EntityState>> accumulatedCallbacks
If we can store the post processings in a dictionary called accumulatedCallbacks, then we can decide when to do the post processings. So, instead of doing
var entityState = entityStateLoader.Load("xxxx"); DoMyPostProcessing(entityState);
we pass the post processing as Action
entityStateLoader.LoadLater("xxxx", DoMyPostProcessing);
Now, this seems works, except when are we going to call these accumulated callbacks. When are we going to load entities with those ids? To answer this, we'd better to look at the code
public void Flush() { while (accumulatedCallbacks.Count > 0) { var callbacks = new Dictionary<Guid, Action<EntityState>>(accumulatedCallbacks); accumulatedCallbacks.Clear(); ApplyCallbackOnEntities(callbacks.Keys.ToArray(), callbacks); } } private void ApplyCallbackOnEntities(IEnumerable<Guid> ids, Dictionary<Guid, Action<EntityState>> callbacks) { var loadedStates = states.BatchLoad(ids); foreach (var state in loadedStates) { callbacks[state.Id](state); callbacks.Remove(state.Id); } foreach (var callback in callbacks.Values) { callback(null); } }
The important stuff is in the while loop. The sequences are:
- Copy accumulatedCallbacks to a local variable
- Clear the accumulatedCallbacks
- Loading happed: states.BatchLoad(ids)
- Each callback being called
- A tricky thing is, while callback being called, the accumulatedCallbacks will accumulate more callbacks in the mean time, because the callback will call LoadLater to load its references as well.
- If the accumulatedCallbacks not empty, repeat the steps again
Essentially, we turned a sequential process into a async connected steps, which is also known as continuation. Then, we can archive better runtime performance and still not making the main logic (load and assign back to fields) not knowning the the performance optimization we have done. Another example of separation of concern.
http://tomasp.net/blog/csharp-async.aspx
... seems like you are trying to do some of the same things where you create an iterator to abstract out the idea of an async workflow.
Subscribe to Post Comments [Atom]
<< Home
Subscribe to Posts [Atom]