Handling large lists

Handling large lists

Old forum URL: forums.lhotka.net/forums/t/31.aspx


jrnail23 posted on Tuesday, May 09, 2006

I'm in charge of a soon-to-be-redesigned product (old-school ASP web app) which manages voter data (databases which could contain millions of records), and I'm considering using CSLA.NET2.0 for the product's next generation.

I love just about everything about the CSLA framework, but I'm not sure how it would handle two of my product's major functional areas: a.) working with large selections of voter records (must be able to handle selections of a couple hundred thousand), and b.) flexible ad-hoc queries (to retrieve voter records matching search criteria defined in the UI).

I'm assuming these would represent scenarios where one might just stray away from the CSLA doctrine of OOP, but I'd like to know what you guys might suggest for handling these types of scenarios.  Any feedback is greatly appreciated.

RockfordLhotka replied on Tuesday, May 09, 2006

a) This is problematic with any technology. Even with a DataSet you'll have a hard time getting perf with 200k rows - especially in a stateless web environment.

The ideal answer here is avoid having to load that many rows. Examine your use cases (user scenarios, stories, whatever) and try to figure out why the user thinks they can view and work with 200k rows of data. It is very hard to imagine how a human can process that much data in any meaningful way Smile [:)]. More likely there are real use cases involving subsets of this data.

On the off chance that there really is a use case where the human views and processes 200k rows (which I've seen just once - but it was valid imo) then you've got a problem. You can try to use the DataSet, but I'm skeptical that it will work either. More likely you'll need to do a windowing scheme, where your object (collection, dataset, whatever) is a view into a windowed subset of the data. There are some very good approaches to doing this that use the database to do the windowing, so only the current window of data is even returned to the web server.

Using a windowing scheme like this can give the user the illusion (especially in the web) of interacting with 200k rows, when in reality they are only ever interacting with a window of perhaps 20 rows.

b) if the filtering is done by row, then this isn't an overwhelming issue. There are many discussions on how to do complex criteria objects on www.searchcsla.com and I think some of the answers are quite workable for most people.

If the filtering is done by column (in other words the user gets to select columns to view and can change that list at any time) you may want to consider using a loosely-typed DataTable. This is the scenario for which the loosely-typed DataTable was designed, and for read-only data it is often a perfectly adequate choice.

I typically create a ReadOnlyBase-derived object to return the DataTable as a property. That way I get to use the data portal and so forth, but still get access to a totally dynamic result set.

jrnail23 replied on Wednesday, May 10, 2006

Rocky, Thanks for the reply and all the great work on CSLA...

As I contemplate the use cases, etc. for my voter database app, I realize that this might not be very similar to many apps which use CSLA.  While there are several parts of the app that are OLTP in nature, a large portion of our functionality involves working with a voter list itself, rather than working with one voter at a time (more along the lines of creating and managing mailing lists or telemarketing/survey target universes, and performing bulk and/or rapid updates of information collected from survey-type interactions).  So while in some situations we may not need all 200k records loaded at once,  we definitely need to manipulate and/or report on such a big list in other situations.  Unfortunately, the way many people have done this sort of thing before is through the direct use of ADO-type techniques, and extensive use of temp tables... I'm dying to get away from that model, as I MUCH prefer OOP and the various advantages it yields. 

The other big thing that I mentioned is ad-hoc searches... again, here is perhaps a kind of non-standard use case.  While most apps provide search tools to present a specific customer record to update, or to present a collection of orders placed by a customer, we're more concerned with finding a list of people meeting demographic or geographic criteria to work with (consider a user wanting to create a bulk-mailing list, targeted to males between the ages of 35-54, living in Alameda county who are registered Democrats).  The usual approach for these types of applications is to construct a complex SQL WHERE clause to be executed directly in the business code (or worse, in the UI code).  Again, as someone who believes -- like you -- that business & UI developers shouldn't be dealing with SQL, I think you can see why I'm dying to escape the old way of doing things.

Thanks again for the feedback, and keep up the great work!

RockfordLhotka replied on Wednesday, May 10, 2006

Certainly the most common stererotypes used in CSLA (editable root/child, readonly root/child, etc.) apply to OLTP far more than to batch processing.

However, OO design patterns for dealing with batch processing to exist (iterator, flyweight, etc.) and they are the appropriate techniques for implementing that sort of thing.

Whether CSLA offers value to you in those cases will depend on many things. Certainly it will not get in your way - the whole point of CSLA is to offer a set of capabilities you can use selectively to make it easier to implement your objects. If you don't need some or all of the features, then you don't need to use them.

You might consider using a Command object to allow the user to launch the batch process, such that the processing actually occurs on an application server. Within the implementation of the command object you may or may not use CSLA-style business objects. You may, for instance, consider retrieving the data through a DataReader and using a single object (not derived from CSLA) to process each row of data. If your algorithms or processing needs are complex, this object may collaborate with other objects to accomplish the business goal.

Copyright (c) Marimer LLC