creating 1000s of objects in web app - looking for best practices

rbellow posted on Friday, October 19, 2007

I have a need in my current web application to have users upload .csv files of contacts. I want to create each contact through my business objects to enforce all the business rules. Obviously if I have thousands of contacts that need to be uploaded I am going to run into timeout issues. Plus I dont want to have the user just sitting there waiting. So I am looking at using ajax and the persistent communications pattern. Does anyone have a better recommendation?

tmg4340 replied on Saturday, October 20, 2007

Well - I'm not really an expert on all this, but I wouldn't recommend trying to leave this in a web page at all. You'll kill your web-server performance, because the solution just doesn't scale at all. That is, of course, assuming you actually have thousands of contacts to import - have you confirmed that you actually have to support that level of data? I ask from experience - I've designed solutions in the past to handle volumes of data that were never even considered by my customers... Embarrassed [:$]

Setting aside the concept of creating thousands of business objects - a plan I wouldn't necessarily recommend, regardless of the solution implemented - this is not the kind of process that does well in a web app. What I would recommend is that you use your web app to upload the file and kick off some external process to actually process it. If it doesn't have to be real-time, your web app could simply upload the file, and some scheduled nightly process could process the file. If it does have to be real-time, then there are a few different opportunities - an SSIS job, a separate Windows service, etc.

Ultimately, if it were me, I would not use the business objects as part of this import at all. I understand that's potentially a duplication of code effort, but the business objects CSLA creates - especially if they are editable, which I am assuming they are - are just too heavy for a process like this. That also makes it conceptually easier to separate the import process from the rest of the application. If it's possible to centralize the business rules and access them both from your BO's and your import process, then do it - you'll thank yourself later. But I think trying to re-use your business objects here, given the scope of data, will just end up causing you headaches.

JoeFallon1 replied on Sunday, October 21, 2007

I agreed with tmg4340 up to the point where he said you may not want to use your BOs for the import. I would want to use them. Especially for their Validation Rules.

I would read each contact row and then build the BO from the data, validate it and and save it.

I would write out exceptions to a log file and then continue processing.

This might take a little longer - but if it is an offline job then so what?

Joe

tmg4340 replied on Monday, October 22, 2007

I agree that speed is less of a concern in an offline job - but I'm not only concerning myself with speed. Creating thousands of BO's simply to use the validation rules seems like overkill. Plus, consider the database traffic involved, since each BO would essentially be its own transaction. I realize the BO's are lighter than a DataTable, but there is still a cost to creating them, and that offline job will suck memory like a Hoover (or maybe a Dyson) if it has to create that many BO's. Rocky has said before that CSLA's objects are designed more towards an OLTP type of situation.

That's also why I suggested that they should investigate whether the validation logic can be centralized into a separate component. If so, then it could be used by both the BO's and the import process, and then you really don't need the BO's. If they can't - well, then you have to make a decision. There's nothing wrong with using the BO's - but I'd look for another way to do it first.

ajj3085 replied on Monday, October 22, 2007

Perhaps what you need is some business objects that take the data to import and apply validation to them. So, not reuse your existing BOs, but create new ones.

The trick is to centralize your business logic; you can do this by offloading the rules to another internal object. Maybe your normal BOs and your import BOs could inherit this class, if they are not too different.

Just a quick thought; you don't want your import requirments to affect your normal BO's use case, nor do you want your normal use case to hinder what you're trying to do through the import.

david.wendelken replied on Thursday, October 25, 2007

Is this a one-time upload of data to convert/catch up? And then the number of objects will be much smaller?

Or are you talking about thousands of them a day?

It's a big difference!

Does one csv file have lots of contracts or is it one contract per csv?

That makes a big difference because you have to decide how to handle the 99 good ones in the same file as 1 bad one. Can you go ahead and accept the 99 good ones, or will you have to reject 100 contracts because of one mistake. The wrong technical solution here will kill the app!

Second, whenever you have to accept data from someone else's system, you have to plan for the fact that a certain percentage of the data will be garbage. If you are receiving hand-entered csv files, I would be amazed if you don't have a high error rate.

You have to plan for where/how you want incorrect data to be corrected? Does it have to be corrected in the csv file, or in a temporary holding object or table, or in the permanent object or table?

How will you handle/weed out amended resubmissions and/or duplicates?

Anyway, food for thought! :)

DavidDilworth replied on Friday, October 26, 2007

Importing "unclean" data from any other source always has the possiblity of "bad" data. So using some kind of editable BO to validate (i.e. "clean") and persist your imported data seems like a good idea to me. Whether that's the same editable BO as the one you use to put data into your system via the application is your call. It may be a slightly different one as suggested above.

What you do with the "bad" data is a good question that you must consider.

Whatever you do though, you probably don't want this happening inside the request/response model of a web page. It should be done as an offline/background process.

Yes, you may allow the data to be uploaded via a web page, but don't put the processing of the data into the same request as well.