CSLA Linq Index support for .Any( item => ...) linq method

CSLA Linq Index support for .Any( item => ...) linq method

Old forum URL: forums.lhotka.net/forums/t/6781.aspx


Jaans posted on Monday, April 13, 2009

Hi All

I just ran into a performance issue when using the .Any() linq method versus the .Where() method.

It's against a fairly large ReadOnlyList collection (~10,000 rows). Yes the Index attribute is applied to the property of the child object that is used in the Any() / Where() method.

Is there any reason the .Any method should not also be able to benefit from the index?

Here's a snippet of the code for the Any() method:
// Much simpler and cleaner code but peforms worse, presumably not using index
return this.Any( item => item.Id == id );

Compared to the .Where() method:

// More verbose but performs better with the Csla Linq Index
if (this.Where( item => item.Id == id ) != null)
return true;
else
return false;

Jaans replied on Monday, April 13, 2009

Apologies, it would appear my first post was in haste and has missed a couple other difficulties and it also has an error.

First the errata: The .Where() example above is flawed because it will allways return something, even if the the list is empty, so the if statement surrounding it will allways test true.

The following code would be a suitable fix-up for the .Where() example:
var result = this.Where( item => item.Id == id );
if (result.Count() > 0)
return true;
else
return false;

Note that "result" is a reference to the linq query, and if my understanding is correct, it will not yet execute the Linq query until enumeration is requested, for example from a Count(), FirstOrDefault() method to the result. This is where the performance penalty is incurred.

This brings me to a rather different issue in that I'm unable to get comparitive performance using the Linq Index (Red/Black trees) versus manually maintaining a Dictionary collection as the custom index.

Don't get me wrong... it think the work done to improve the performance of Linq queries over CSLA based objects are exemplary and helps significantly. I'm just trying to figure out when to use what and how much help the [Indexable] attribute *really* is.

I might be missing something here but, my practical experience has me making the following conclusions:
* [Indexable] attribute makes no difference on a .Any(...) or .Count(...) linq query.
* [Indexable] attribute improves the performance of a .Where linq query significantly
* Use a custom manual index represented by a HashTable or Dictionary collection if you need maximum performance.

I would value your comments / input on this.

Regards,
Jaans

RockfordLhotka replied on Monday, April 13, 2009

The first implementation of indexing used a hash table as an index. And that is good for equality, but useless for <, >, <= or >= tests. I asked Aaron to switch the implementation to a balanced binary tree to broaden the scope of when indexes were useful to include those tests in addition to equality.

Anyone who's had basic computer science knows that the performance characteristics of a hash table lookup for equality are (typically) better than a balanced binary tree, but that a balanced binary tree is pretty darn good. I felt that some small cost for the equality test was worth it to get the other four tests.

Aaron wasn't entirely convinced, and so he made sure that the index engine uses a provider model. I assume you've discovered this, and aren't modifying CSLA itself to replace the index provider with your hash table/dictionary model. Aaron went to some effort to ensure that people could replace the index provider if they were only doing equality tests and didn't need the other tests to be fast.

I do believe the focus for index usage is entirely around Where(). Your observation that it would be useful for Any() and Count() seems valid to me.

But you must remember that Aaron has built the entire LINQ to CSLA functionality as a volunteer effort, spending time he would have otherwise had with his family to do this. Personally I really appreciate what he's done, and while I very much hope he continues to help enhance and improve the functionality, it is still a volunteer effort.

Which brings me to my close - I know that Aaron is now on a full-time traveling project, and I suspect that will severely limit the time he can spend on things like this. If you (this is the broad you - including anyone reading this) would like to help enhance and/or fix issues with LINQ to CSLA please let me know.

Jaans replied on Monday, April 13, 2009

Thanks Rocky

I agree, the LiNQ to CSLA work done by Mr. Erickson and yourself is fantastic and moreover targets what the greater majority of uses may be and aims to strike the performance balance that would benefit most scenarios. No argument there... and the balanced binary tree does exactly that.

I guess my post is more about my own experiences and tries to achieve a few things:
1) Confirm my experiences:
Do other members of the community experience similar performance results? Do you perhaps have other workarounds?

Plugging in your own IndexProvider is certainly one such workaround.

2) What's best practice / trade-off decisions:
How can this information be distilled into guidance for chosing what to use when and what the trade-off / implications are. The discussions surrounding this (and other posts) helps to achieve that.

3) Any() and/or Count():
I have found the need for it on quite a few occasions and only now I'm using it in a performance sensitive scenario.

Does the community also find value to support indexed implementation of Any() and/or Count() LiNQ methods in CSLA? Do they have workarounds?

If others also feel the need for it, perhaps we could wish list it, allowing the community or myself contribute.

Thanks again.

Copyright (c) Marimer LLC