Read-only lists containing large numbers of items

DaleStan posted on Thursday, February 25, 2010

I have a task that requires iterating through up to a million ReadOnlyBase objects, once. For smaller lists, I'd put them all in ReadOnlyListBase object, and use DataPortal.Fetch. In this case, however, that's not particularly workable. Between the object contents and the various overheads involved, I'm getting uncomfortably close to running out 32-bit address space. (Unsurprising, as 4GB divided by a million items gives me a max of 4KB per item.) My current solution is to write my own function that returns an IEnumerable. Its body, quite simplified, looks like this:

SqlCommand cm;
//...
SqlDataReader dr = cm.ExecuteReader();
while (dr.Read())
    yield return GetExportPiece(dr);

This reduces my peak memory usage from over 2 GB to a quite comfortable 150 MB. However, this breaks the "use DataPortal and DataPortal_* to perform all database communication" rule. Does CSLA contain anything that looks like this, or that otherwise supports processing quantities of data too large to fit in memory?

RockfordLhotka replied on Thursday, February 25, 2010

First, do you really need to bring all those rows back to the client? In a 3- or 4-tier deployment that'd be amazingly slow! And even in a 2-tier environment it seems very iffy.

Second, does the user actually see or interact with this? Or is this a non-interactive batch job? I can't imagine a user paging through a million rows - they'd be there for days, or weeks or maybe months!

If this is non-interactive, it is probably best encapsulated in a Command object's DataPortal_Execute() method, and just use ADO.NET - skip the objects (or use your IEnumerable approach).

If this data really does need to come to the client, ouch! In that case I'd suggest you use paging with a ReadOnlyBase command object, and a variation of your IEnumerable implementation.

What you have now restricts you to a 2-tier deployment. There's no way that code can ever work in a 3- or 4-tier deployment scenario.

Here's the basic idea:

Create your IEnumerable list, much like you have
Create a page loader ReadOnlyBase object that loads one page of data (maybe 1000 rows?)
Where you do yield return, return data from the page you have loaded. If you run out of data in that page, execute your page loader object to get the next page
If you want to be fancy, use the async data portal to pre-fetch the next page as you process the current page (don't forget to use a thread sync object to make sure the next page is back before trying to use it!) If you get your page sizes right, this could offer very good throughput - though the right page size will probably vary depending on n-tier deployment model, speed of the client and speed of the database - so if you want to be really fancy, you'd make the page size variable and increase/decrease that size depending on whether you ever hit your locking condition

So your actual list object will never go through the data portal - but this page loader (ReadOnlyBase) object will use the data portal to bring back a page of data at a time - giving you the same basic result you have now, but in a way that works with the data portal and 3- or 4-tier deployments.

Read-only lists containing large numbers of items

DaleStan posted on Thursday, February 25, 2010

RockfordLhotka replied on Thursday, February 25, 2010

DaleStan replied on Thursday, February 25, 2010