Performance and memory use with large result sets (collections)

SteveInCO posted on Thursday, July 27, 2006

Hello,

I've been working with CSLA.Net 2.0 (VB) for several weeks now, and I really like it so far. It is very similar to the approach I took with my last project under VB6, although of course much more sophisiticated and complete.

It also seems to suffer from the same issue I had with my VB6 project...what do you do when you have a large result set (i.e. collection) that needs to be filled? Some of my queries may have several thousand records (and therefore objects) returned. When using a "collection" model for this type of result set, the problems are twofold:

It takes a long time to read all that data and build all those objects during which time the caller is in a wait mode.
The objects take a lot of memory once the collection is populated.

Collections are handy and easy to work with, but they don't lend themselves to large chunks of data.

In my previous project, I took the approach of using collections any time I expected a small result set or wanted to cache data, but then I designed another type of retrieval object for large result sets that functioned more like a Recordset or DataReader...you start it up, then continuously call a GetNext method until you reached the end of the result set.

This works much better for large sets of "throwaway" objects where you have to retrieve a lot of them, but don't need them all in memory simultaneously. There is no long pause while they are read from the data source, and you never have more than one of them in memory at a time.

An example might be generating a report. It doesn't make sense to load every detail line in to memory and then generate the report...it is much more efficient to load them one and a time add them to the end of the report.

So all this leads to the question: Is there some predefined way to do something like this in CSLA? Maybe I missed it in the book (I confess I didn't read every chapter), and I didn't see anything in Project Tracker that used this sort of model. Do I need to "roll my own"?

Suggestions appreciated.

Thanks.

xal replied on Thursday, July 27, 2006

If you really need to do that, you could implement a "merge" functionality in your collection (which would probably be really really simple for what you want to do).
Then you'd be able to retrieve the objects by chunks of tens or hundreds and simply append them to the existing collection...
You could do something like yourCollection.RetrieveMore() which internally will fetch another list passing the id of the last loaded record or something like that and then when the list is loaded append the records to your current collection.

You'd probably want to do this in a background thread, because otherwise there is no gain but in the end the result is probably goint to be. Also, take note that if you're using databinding to display the items and you use threads, this is not safe.

Andrés

pvanroos replied on Thursday, July 27, 2006

... And, if I may... I would like to add on to this... I am building a web application that utilizes some async calls (AJAX). Like everyone, I anticipate a new model for ASP.NET web applications emerging with the advent of AJAX-enabled controls. For instance, I have some web pages that utilize 3 to 4 business objects at once. In CSLA 2.0, that means I have to set up 3 to 4 data sources controls to manage, at the very least, the retrieval of records. If the individual controls on the page utilize async calls to retrieve this data without refreshing and reconstructing the whole page, this more discrete model is fine. However in the traditional sense, this is not acceptable, given that each SelectObject call using CslaDatasource will open and close a database connection per control--thus incurring a performance hit.

Perhaps I'm wrong. Maybe this aspect of scalability is negligible. Maybe somebody has a better approach to the architecture.

Thanks guys.

rlarno replied on Friday, July 28, 2006

pvanroos:

However in the traditional sense, this is not acceptable, given that each SelectObject call using CslaDatasource will open and close a database connection per control--thus incurring a performance hit.

Perhaps I'm wrong. Maybe this aspect of scalability is negligible. Maybe somebody has a better approach to the architecture.

The effect of opening and closing a database connection in .NET would be negligible, as by default connection pooling is used. In fact, opening the connection as late as possible and closing it as soon as possible helps the scalability of the application. As the connection is not actually closed when you call .Close(), but merely returned to the connectionpool, another page request (or background thread in a WinForms app) can pick up that connection and use it. So you are really only using 1 active connection to the database. If you were to keep your connection open, that other page request (or thread) would need to open another connection to your database (most likely to be an already open connection from the connection pool). So at that point you would have 2 active connections being used.

Say that the connectionpool will hold a maximum of 2 active connections and you have 3 concurrent requests comming in, so we are simulating a high load on the server.

If you do not close the connection

request 1: open connection, fetch object 1, do some processing, fetch object 2, fetch object 3, close connection
request 2: open connection, fetch object 1, do some processing, fetch object 2, fetch object 3, close connection
request 3: ... waiting for a connection ...                                               ... open connection, fetch object 1, etc.

The 3rd user will most likely complain, as s/he will need to wait a long time before the request even gets serviced.

If you do close the connection after each db operation, the result might be as follows:

request 1: open, fetch object 1, close, do some processing, open, fetch object 2, close, open, fetch object 3, close 
request 2: open, fetch object 1, close, do some processing ...       open, fetch object 2, close, ...                open, fetch object 3, close
request 3:                              open, fetch object 1, close, do some processing ...       open fetch object 2, close, open, fetch object 3, close

Although that the individual response times might become longer, each user will get their request fulfilled in due time. And so you will be able to server a lot more users at the same time.

hope this helps.

Rudi

SteveInCO replied on Friday, July 28, 2006

It occurred to me that I may not have been clear. I don't advocate opening a new connection and data reader for each of the objects when the GetNext method is called...here's a general idea of how I did it before:

MyObject = MyObjectList.Begin ' Opens the connection, runs the query,
                               ' and returns the first object
While MyObject IsNot Nothing   ' MyObject will be nothing if there were
                               ' no results

    ' Perform whatever processing on the object here

   MyObject = MyObjectList.GetNext ' Get the next object from the results,
                                   ' or returns Nothing on EOD.
Wend

The SQL query would get run during the Begin method and the DataReader that receives the results would be left open. The Begin method returns the first object from the results (i.e. calls GetNext the first time for you). Each subsequent call to GetNext takes the next record out of the DataReader, builds an object and returns it. The only disadvantage of course is that the DataReader is left open the entire time, but as long as your processing doesn't goof around too long, that isn't that big a deal.

I gather by the responses that CSLA doesn't really have anything with this sort of mechanism built in to it.

Performance and memory use with large result sets (collections)

SteveInCO posted on Thursday, July 27, 2006

xal replied on Thursday, July 27, 2006

SteveInCO replied on Friday, July 28, 2006

pvanroos replied on Thursday, July 27, 2006

rlarno replied on Friday, July 28, 2006

pvanroos replied on Friday, July 28, 2006

SteveInCO replied on Friday, July 28, 2006