Root editable collections performance

Root editable collections performance

Old forum URL: forums.lhotka.net/forums/t/7212.aspx


Goran posted on Wednesday, July 01, 2009

I have made a simple form that should display Article data. I load 1000 articles as a test. I created one root collection that inherits from BusinessListBase. Loading of data took approx 18 seconds. Using List<Article> loading same data took 0.3 seconds.

In the book it is stated that this approach isn’t recommended when there are large numbers of potential child objects, because the retrieval process can become too slow. What approach is recommended?

Thanks,
Goran

RockfordLhotka replied on Thursday, July 02, 2009

There are so many ways to load child objects, I can't say what the issue is without understanding the specific technique you are using.

Something is certainly wrong for this to take 18 seconds. I routinely load collections of several hundred items and performance is very acceptable.

Goran replied on Thursday, July 02, 2009

Hello, Rockford, thanks for the response. The code is put together from the examples in your book.

I have an Customer class, which is marked as child in the contructor, It contains about 10 properties,  there are no validation/authorization rules, no factory methods, no data access code (trying to keep it simple enough for the start).

I also have a CustomerList class, which code looks something like this:

[Serializable()]
public class CustomerList : BusinessListBase<CustomerList, Customer>
{
    #region private constructor
     private CustomerList() { }
     #endregion

     #region factory methods

     public static CustomerList GetCustomerList()
    {
        return DataPortal.Fetch<CustomerList>(new Criteria());
    }

    #endregion

    #region data access

    [Serializable()]
    private class Criteria
    { /* no criteria - retrieve all projects */ }

    private void DataPortal_Fetch(Criteria criteria)
    {
        using (SqlConnection cn = new SqlConnection(Database.MyDbConnection))
        {
            cn.Open();
            using (SqlCommand cm = cn.CreateCommand())
            {
                cm.CommandType = CommandType.StoredProcedure;
                cm.CommandText = "getCustomers";
                using (SafeDataReader dr = new SafeDataReader(cm.ExecuteReader()))
                {
                    while (dr.Read())
                    {
                        Customer customer = new Customer {
                            CustomerId = dr.GetInt32(0),
                            PersonalCode = dr.GetInt32(1),
                            Name = dr.GetString(2),
                            Address = dr.GetString(3),
                            City = dr.GetString(4),
                            PostalCode = dr.GetString(5),
                            TIN = dr.GetString(6),
                            Phone = dr.GetString(7),
                            Fax = dr.GetString(8),
                            ContactName = dr.GetString(9),
                            Note = dr.GetString(10)};
                        this.Add(customer);
                    }
                }
            }
        }
    }

    #endregion
}


To make it clear that there is no problem with the data access code, I use the same data access code with the List<Customer>..

What am I doing wrong here?

skagen00 replied on Thursday, July 02, 2009

One thing you may want to do is have RaiseListChangedEvents set to false when your collection is being loaded (outside of your while loop)

Also, I've not used the syntax you have above - aren't you setting properties with that logic? That would go through lots of authorization checks & property has changed calls...

Normally the practice is also to have the child objects themselves populate themselves from the datareader via an internal method. The method you are using is not the convention...

Goran replied on Thursday, July 02, 2009

Until know I didnt pay attention to the code I wrote last night. Yes, you are correct, child class should populate itself, not the collection, which is totaly logical. :) Well, 3 A.M is time to sleep, not to learn new tehnologies, as I was doing last night. :)

After applyting the correct code, the speed is pretty similar to a List<T>.

Thanks for the help.

Goran replied on Wednesday, July 08, 2009

Its me again. :) I am doing a performance test with loading collection data using csla ver 2 (that was released with the book 2005) and the newest framework version.

I have corrected above code to load values from safereader into private internal fields (not setting properties directly). RaiseListChangedEvents is set to false. Loading 100,000 customers takes approx 3 seconds with the clsa.dll ver 2.0. Using the same code with csla.dll ver 3.6. the time needed to load 100,000 customers has raised to 12 seconds.

I am not saying I would ever load that many records in a production code, this is just for accurate testing results. Why exactly is the new version so slow? I am waiting for the 2008 version book to arrive (hopefully this weekend), so I cannot say what is changed in this new framework, but from the code examples, the only difference I see is that now we need to register properties, everything else is pretty much the same.

Edit: For the illustration purpose, same code using List<T> needs 2 seconds to load data.

RockfordLhotka replied on Wednesday, July 08, 2009

Was it this thread or another one where someone profiled and discovered that raising PropertyChanged was a bottleneck?

 

I suspect that PropertyChanged is slower now because of the ChildChanged event. Whether that’s enough to make this difference I don’t know, but that is the only event-related feature I can think of that’s been added.

 

Rocky

 

skagen00 replied on Wednesday, July 08, 2009

He does make reference to registering properties - I wonder if loading the values into the field manager isn't what is causing the extra overhead.

I guess a profiling application (ANTS, etc) might help you uncover the biggest culprits... I don't get the sense that propertychanged/childchanged has anything to do with it since he's not raising list events and presumably using LoadProperty versus SetProperty...

I think you mentioned a performance hit from field manager (suggesting using private backing fields is more efficient at the time these came out) but I don't recall it being 300% :)

RockfordLhotka replied on Wednesday, July 08, 2009

There's absolutely some overhead to using managed backing fields.

 

It sounded to me like it was "the same code" plus registering the PropertyInfo<T> objects. When using private backing fields the property registration should have very little impact at all, and if that's the case it is probably due to whatever is going on with PropertyChanged.

 

However, if it is not the same code - in that the properties were switched to managed backing fields, then that's another source of overhead.

 

I did a lot of perf testing comparing private and managed backing fields as we implemented the dynamic method calling features (among others).

 

Here are some numbers (though not fully isolated, because data portal variations are involved as well. These are load times for a full object graph (not just an object or collection, but an ERLB with 100 root objects, each root containing a BLB of 100 children – with the graph loaded numerous times to offset external perf spikes or other outside factors).

 

The numbers I’m providing here are summary values showing the number of objects loaded/returned per ms on average:

 

CSLA 3.0 features (no child data portal, private fields)

144 obj per ms

 

CSLA .NET 3.5 partial (child data portal, private fields)

84 obj per ms

 

Full CSLA .NET 3.5 features (child data portal and managed fields)

54 obj per ms

 

Again, this factors in more than just managed fields, because it brings in the child data portal as well as other factors like ChildChanged processing. These are fully loaded tests, in that they intentionally exercise as much of normal processing as you’d find in a real app without talking to an actual database.

 

Obviously different testing scenarios will have different characteristics – we already see that with the straight-up BLB testing of 100k items.

 

Rocky

skagen00 replied on Wednesday, July 08, 2009

So in reality that while managed fields may be twice as slow for loading as private fields, the difference is quite negligible - b/c database access is going to be orders above 50-100 objects of 100 children per millisecond...

Goran replied on Thursday, July 09, 2009

You say that difference is negligible. What exactly is gained with new framework version, what benefits we will get, in order to justify 600% of performance loss, which is how much I have measured?

Is there no way to get some similar performance with the new framework version? To turn of

Just to clarify what have I used while testing on the 100k customers:

1) List<Cusomter>, where Customer doesn't inherit from BusinessBase, and has direct access to private fields - approx 1.5s

2) using Csla version 3.0 - Customer has direct access to private fields - approx 2s (very good preformace)

3) using Clsa 3.6 - using static properyInfo<T> for each property,  fetching data is done through LoadProperty - approx 13s

3) using Clsa 3.6 - using the "okl 2005 way" with private fields and no static PropertyInfo<T> - fetching data is done using private fields - approx 8s

skagen00 replied on Thursday, July 09, 2009

I haven't looked at your tests so I can't say whether or not you may be missing optimization of some sort or better handling of the loading process.

When I said negligible, I'm referring to Rocky's numbers that he gave in populating objects without database access.

Database access will account for the lion share, generally, of getting CSLA business objects in-memory and ready to use. The population method should be all-in-all quite the minority of processing time.

So Rocky just gave numbers that said w/o database access if I populate 150 objects with 100 children each, I'm going to lose 2ms. I have never populated objects from memory like that and it seems absurdly fast to me, but if that's the hit, I don't see any use case that is adversely affected by that. If he meant objects in total and thus multiply that by 100 (children/root) then for every 150 objects I populate with 100 children each I'll lose 200ms.

I can't think of any use case that I've used where I do that sort of operation to begin with... loading 15,000 objects (root & child combined) at a time.

Rocky, are those numbers you gave quoted with the right unit of measurement? :)

Anyways, it's hard for me to make a statement about your loading process, but the benchmarks Rocky gave - to me - don't have me worrying about performance at all.

RockfordLhotka replied on Thursday, July 09, 2009

My numbers are measuring how many objects are created in a period of time.
Kind of the reverse of the normal approach, but better for my testing given
the variations in technique.

And the tests are run by creating a bunch of mock data purely in memory,
including numeric values and string values of various (known) lengths.

In other words, the algorithms used to create the mock data are consistent.
The resulting data sizes are consistent (and varied). And it is purely in
memory, so there's no variation due to hard drive speed or load or caching
or database optimization (or lack thereof), etc.

The point is to isolate the object creation/loading/return process as much
as possible and to eliminate as many peripheral concerns as possible.
Otherwise you end up with iffy results because you could be encountering
issues with the network, the database, the hard drive, etc.

It is bad enough that you really can't control Windows - you never know for
sure that Defender or your antivirus or something didn't kick in.

So my numbers are also an average of numerous runs of each test to help
minimize the effects of all those other factors. And those effects are very
real - I had a fair amount of variation, though I don't recall throwing out
more than one or two result sets completely (if a result exceeds a standard
deviation it is probably an outlier).

So the numbers represent creating X objects per millisecond.

Rocky

Goran replied on Friday, July 10, 2009

I agree that removing side affects from hard disk, some windows apps, etc will bring more correct results. I did have variations which are probably caused by them, but those variations were never more than 0.5 seconds (in 90% of the cases). So I also took the average score of the "common results". I have also removed the effect of "first time opening" connection, so I made sure I open/close it before I do measuring. The database is local, so no network overheat is involved.

I come from the (mostly) procedural programming (we all use objects and patterns, although not in the pure sense of OO's behavioral approach), so I am still looking at things in a way like: do I really need this, was it worth of having these side affects? My current applications are working flawlessly, and clients are satisfied with their performance. I, on the other hand, am not happy with its scalability, and with its "applicability" (like web forms interface, mobile app interface, etc). This is why I came to the idea of using CSLA. And while I was reading 2005 book version, I embraced the ideas, and learned many things (which are still mostly on theoretical level), and all doubts I had were performance-wise. To my great surprise, the numbers were very acceptable (30-40% of performance lost, but great scalability achieved) with the csla ver3.0.

And then I decided to try the 3.6 framework while I wait for the 2008 book version to arrive. I had examples to look at, and see the difference in approach. What I did at first, was to just remove old version of csla dll from references, and add new one - nothing in the code was changed. The result were very bad (8 seconds), to my dissapointment. Then I looked at the code examples to see if there is some new apporoach, and I saw that PropertInfo<T> are now used for declaring/accessing properties, so I changed just it, and I also needed to change the code for fetching data, now using LoadProperty method - and this had even worse performanse (13s). Scagen00 menstioned there could be some code optimization that could be made to get better results. What are these? I have said what exactly I changed in code that was well optimized for csla ver 3.0 (loading 100k customers from database took 2s).

I still dont know what did we get from new version of csla, I will know it when I get the book. I see SilverLight support, but I dont belive that is connected with this performance loss. i hear there is also ChildChanged event, but that also should not create such overheat. Can someone explain what? I know I am repeating myself, but this is becuase I have not received answer. My opinion is that, when we sacrifice something, we expect some gain to receive from it. What is it in this case?

RockfordLhotka replied on Thursday, July 09, 2009

Here’s my take on it.

 

I use managed backing fields by default, along with the child data portal and LoadProperty(). This is the simplest coding model, lets CSLA do the most work and it is compatible with Silverlight.

 

When or if I have an object (usually a large collection) where the performance becomes a problem (and this is pretty rare) I’ll switch to private backing fields and stop using LoadProperty() to load the object data. I still use RegisterProperty() and the child data portal.

 

I have yet to hit a case where I had to quit using the child data portal, though that’d gain a tiny bit of performance too.

 

The thing to remember is that performance is all relative to user satisfaction. It doesn’t matter if something is x% faster or slower if the users don’t notice or care.

 

As an industry we constantly give up performance for productivity or abstraction. Look at the fascination with ORM tools, or the use of data binding instead of manually setting/getting data in the UI. Those are huge performance hits (especially data binding in some cases), but no one thinks twice about using them because they get such huge productivity and maintainability benefits.

 

Rocky

 

Copyright (c) Marimer LLC