Possible CSLA .NET 3.5 enhancement (indexed LINQ queries)

Possible CSLA .NET 3.5 enhancement (indexed LINQ queries)

Old forum URL: forums.lhotka.net/forums/t/3894.aspx


RockfordLhotka posted on Tuesday, November 13, 2007

A colleague of mine at Magenic, Aaron Erickson, has written a library for LINQ (i4o) that adds indexed query capability when doing a select from a list of objects. Aaron has volunteered to merge this functionality into CSLA .NET 3.5, which is very cool!

The result should be that LINQ queries run against a BLB or ROLB list would use an index on the child object properties, and thus repeated queries against the same list would be faster.

Obviously creating and maintaining an index isn't free. It takes processing power and memory. But if you do a lot of LINQ queries against a list, having an index can make a major difference in performance.

With this feature, would you:

  1. Expect indexing to be on by default or off by default?
  2. Want any on/off switch to be public or protected?
  3. Want to have lazy creation of the index (so the first LINQ query turns it on and builds the index and thus is slow, but subsequent LINQ queries are very fast)?

Any other feedback/ideas are welcome too. This will be one of the major new features in CSLA 3.5, and I'd rather get input now than after we're in a beta cycle.

William replied on Tuesday, November 13, 2007

Here are my feedback.

1. I think indexing should be off by default. As in SQL Server table, there is no index defined by default, except clustered index, which rearranges the underlying data pages. However, I think the equivalent of LINQ index is non-clustered index SQL Server.

2. I prefer a tri-state switch to cover up item (3), as On, Off, OnDemand. The property should be public as the client/UI should be able to decide when it needs performance and the trade off, and BO should not dictate this on behalf of different usage contexts.

3. See (2).


Thanks.

ajj3085 replied on Wednesday, November 14, 2007

I haven't used any .Net 3 or higher stuff yet, so I'm not sure I can answer directly.  I would like to know though, how often do you see Linq querying of business objects?  I know that it will happen... but how likely / useful is it?  That might help determine sane defaults.

JoeFallon1 replied on Wednesday, November 14, 2007

Rocky,

Interesting timing. My colleague Alex and I were just working on indexing of collections. We did some perf testing and determined that when you are looping over the collection (in a For Each statement) that calling methods like Item and Contains (which loop until they find a value) are 1,000 times slower than using the indexing mechanism described in the article.We are in the middle of adding a Property named ItemWithIndex to our BusinessListBase class.

Public Overridable ReadOnly Property ItemWithIndex(ByVal key As Object) As C

We planned on loading the index on the first request and then re-using it on subsequent requests. We also decided not to keep the index up to date with any list changes so we avoided the work of modifying all the Add and Remove methods.

Instead, if the index does not contain the item, we branch the call back to the standard Item method which does have the up to date list.

So if a collection has 2,000 items in it and is not changed, then the index is built on the first pass and then re-used 1,999 times. This is extremely useful when looping over the collection ina For Each statement.

If you add 1 item to the list then the index is still used 1,999 times and the standard Item method is used once.

Indexing is always ON for our new method.

Indexing should probably be ON by default as that gives a huge perf gain. The developer should be able to "turn indexing on/off" for a given List. I guess the switch should be Public as the dev could then set it in either the BO or the UI. If they want the default to be off then they can set in in their code-gen templates.

I think lazy creation of the index is fine. But will you keep the index up to date as the collection changes? Or should the index be re-built at some point? What is that point? Number of requests that miss the index? Or on demand?

Joe

RockfordLhotka replied on Wednesday, November 14, 2007

Contains() is interesting, because it ties into the other thread I started about Equals(). Contains() loops through the list looking for an object that is Equals() to the current object.

If Equals() reverts to System.Object.Equals() then it no longer would be bound to any property on the object, but instead is (by my understanding) bound to a hidden GUID value created on a per-object basis by System.Object - and that couldn't be indexed because it is private to System.Object...

I'm somewhat surprised that the indexer (Item property) was sped up? That doesn't use a key value at all right? Just a numeric location index.

Our intent is to maintain the index(es) over time, so they are always current. It isn't that hard to do and provides a consistent perf benefit for queries.

The index technique we're using right now is a simple dictionary/hashtable. Nothing fancy. I do have a red-black binary tree implementation that would probably be more memory-efficient. But there's a higher processing cost to maintaining a balanced binary tree than to maintaining a hashtable, so it is a bit hard to say whether CPU or memory is a more precious resource... At the moment, because a hashtable is simpler, we're sticking with that.

JoeFallon1 replied on Wednesday, November 14, 2007

"I'm somewhat surprised that the indexer (Item property) was sped up? That doesn't use a key value at all right? Just a numeric location index."

Rocky - I should have qualified that. It is an Overload of the Item property that got sped up. The overload works just like Contains.

e.g. This is not exact code because the index As Integer standard Item method would cause this to not compile. But it gives the idea.

Public Overloads ReadOnly Property Item(ByVal key As Integer) As SomeBO
Get
 
Dim obj As SomeBO
 
For Each obj In Me
   
If obj.key = key Then
     
Return obj
   
End If
 
Next
  Return Nothing
End Get
End Property

We are using a Dictionary(Of Integer, List(Of C)) in our index.

Alex said that it is of negligible size while in memory. But that Serialization causes the size of the BO to grow more than if the index was not present. Perhaps significantly. Are you aware of this potential issue? Will cloning of the BO clone the index? Any issues returning the index through the DataPortal as part of the object graph? I thought there was an issue around Hashtables and serialization. Like the Hashtable had to be a field in class marked Serializable or something.

Joe

RockfordLhotka replied on Wednesday, November 14, 2007

I see, that makes sense. I don't think I'll be adding overloads like that for these methods/indexers. But once the indexing capability is implemented, you'd be able to implement such an indexer/Item property or a Contains() that would run a LINQ query to do the work, and that would use the index if there is one.

There's no plan to serialize the indexes. The indexes will be marked as NonSerializable and would be recreated on the client and/or app server. That's one reason why I think lazy loading of the indexes is so important.

Consider the scenario of loading a list. The list is created/populated on the app server, then is immediately returned to the client. Creating a set of indexes on the app server would be a complete waste, because they'd never be used, and serializing those indexes back across the wire would be highly counter-productive, especially across slower network links.

So the indexes must be created per-AppDomain/machine/tier. And to avoid that type of overhead, they must be lazy loaded - created on demand.

DavidDilworth replied on Friday, November 16, 2007

My suggestions would be:

(1) Off by default.  Don't enable something that you don't expect to use.

(2) I can see that public gives most flexibility, but protected is probably the better scope from a design perspective.  One pattern could be to make it accessible via the Criteria, but keep the actual property scope in the BO protected?  That way you can make the choice via Criteria passed to the factory?  Also, it would need to be virtual in the CSLA base classes so that it could be overridden in a derived class with a fixed value if needed.

(3) I like the suggestion of having it On/Off/OnDemand, so you have control over how you want the BO to work.

Curelom replied on Wednesday, November 14, 2007

1.  I'd expect indexing to be off by default.

2. Public would probably be better, to give the gui developer more options.

3. I think this feature would be a nice to have, and should have the option to turn it on or off.

This might be overkill, but perhaps a method to manually kick off the indexing so the developer could possibly have a list indexing in the background.

Curelom replied on Wednesday, November 14, 2007

I'd like to change my suggestion to have indexing turned ON by default.  After looking at the i4o implementation and taking a database paradigm, indexing is based on the table.  The developer runs a query and it uses the indexes defined on the table(similar to the way i4o defines indexes at the class level) by default.  If the developer doesn't want to use the indexes on the table they have to specify in the query not to use them.

Copyright (c) Marimer LLC