Refresh in multi-user environment

SonOfPirate posted on Tuesday, May 30, 2006

No sooner do I get done posting an explanation about how we've implemented multi-level caching into our hybrid-CSLA framework when I find out that we have an issue with our implementation.

The issue is a result of our desire to allow objects to refresh themselves with changes in the data store. The vast majority of our applications are multi-user in nature and include several "monitoring" features along with the standard record-locking stuff like timestamps, etc. to address concurrency issues. What these monitoring lists provide is an automatic refresh of their data from the underlying data store so that changes made by other users are reflected in the current view.

An example of one such application using the previous version of our framework (also based on CSLA) was a work-flow application that tracked a client's jobs as they progressed through their manufacturing facility. As tasks were initiated, completed, added, removed, etc. on the shop floor using hand-held, wireless and hard-wired devices, the central monitoring application (a WinForms application) provided a concise list of the active jobs statuses in virtual real-time fashion. Our list was driven to refresh itself every 'x' seconds automatically with data binding updating the display with any changes to the collection.

To accomplish this, we created a Refresh method that executed a designated stored procedure on the data source to pull in ALL of the records (so we would get deleted items as well so they could be removed) then iterated through the returned data reader rows updating the collection's items by adding any new objects, removing any deleted records and modifying any that had changed.

In retrospect, this seemed kinda cludgy and we were looking for a more efficient way to accomplish the same goals. We also did not make use of the DataPortal parts of CSLA previously and mandated that we incorporate this into our new version to enable remoting, etc. However, CSLA does not allow for this type of activity naturally because the DataPortal is working on entire objects/collections and not just the data/contents of those objects/collections.

So, as best as I can figure at this point, the only way to accomplish the above scenario is to dispose of and re-instantiate a new collection each time we want to refresh our list. Of course, this would happen naturally with a web-based solution, but seems pretty severe for WinForm apps. Is there a happy medium here somewhere? We would like to maintain our references to the object, reduce our possible memory consumption by only instantiating the object once but have the ability to update the object with changes made by other users applied to the back-end data store.

One idea that was presented was to rework the stored procedure in use to return only records whose LastModified timestamp was after the most recent refresh, this would reduce the returned records dramatically and improve the efficiency of the process but still does not seem to fit within the CSLA construct.

Any thoughts/suggestions?

hurcane replied on Wednesday, May 31, 2006

I don't think what you want to do is incompatible with CSLA.NET. Based on your description and desire to be able to support remoting, you could make the data retrieval of the "refreshed" data a separate read-only list object. The root object (MonitorList) would call the factory "Get" method of the read-only object (MonitorData). The read-only object would retrieve the refreshed data. The root object would then update itself from this date.

The data reader code you have now would have to be adapted to use the list, but that shouldn't be too difficult.

It might be appropriate to use MonitorData when initially loading the MonitorList. You can call the Factory method of MonitorData from inside the DataPortal_Fetch method of MonitorList. If you do this, put the RunLocal attribute on the MonitorList because there is no need for it to be serialized to the remote server.

stefan replied on Thursday, June 01, 2006

Thinking a bit further on
'One idea that was presented was to rework the stored procedure in use to return only records whose LastModified timestamp was after the most recent refresh'
You'll need to find a way to get the deleted ones...

Here's an idea:

Assumptions:

you are using MSSQL,
the table to monitor is called tblJobs
and you are refreshing every x seconds.

Create a new table 'tblJobMonitor with columns 'newID', 'updatedID', 'deletedID' and 'DateTimeInserted'
For tblJobs, create triggers for update, insert and delete. These triggers feed tblJobMonitor with the inserted.IDs, updated.IDs and deleted.IDs.
Create a DTC-(now SSIS-) package that runs every 2*x seconds and deletes all entries in tblJobMonitor, that are older than 2*x seconds.
Let your application retrieve only the inserted, updated or deleted data by doing a SELECT FROM tblJobs INNER JOIN tblJobMonitor ...(well, you'll need to join tblJobMonitor three times at once i.e. for each column in tblJobMonitor)

I think that's it...

Report your results...

Stefan

SonOfPirate replied on Thursday, June 01, 2006

Since posting this, we've taken a closer look at how we were approaching this problem and dug deeper into the CSLA 2.0 data portal stuff. One area of interest was incorporating the ability to cache our objects as well. We wanted a way to accomodate all of this in one shot. It appears that this topic is overlapping another thread in the forum: Using ActiveCache to minimize transfered data?

While I understand the last post, it seems like a lot of work to accomplish and greatly reduces the portability of our applications. We have to develop for not only SQL Server, but Oracle, Access, mySql...you name it. So data-agnosticity is critical for our framework.

What I probably should have explained in the original post to be clearer was that we don't physically delete records, we mark them with a deleted timestamp and the user performing the delete. At the same time, the LastModified timestamp is updated because deleting the record is considered a modification (just as adding one sets the LastModified and Created timestamps). As a result, a query can return all of the items added to the table, records that have had field values changed as well as those marked as deleted since the last refresh using just the LastModified field.

This seems a whole lot cleaner than adding tables and triggers, etc. and I want to shy away from having to create extra classes to accomplish this task. I am still concerned about one aspect of the way in which the DataPortal stuff works that seems to be what is holding us back on this topic as well, and that is the fact that the entire object is returned from the DataPortal methods. In our previous version, the timestamps were all set at the database level and returned to the application via the ADO.NET call and we were able to easily update the object's properties internally. Again, we were not using the DataPortal features. Now, since a new instance of the object is returned from DataPortal_Update, for instance, the whole issue of resolving references, etc. in client applications aside, this creates a bit of a quandry when it comes to updating the properties of the object upon return. Rocky indicates that the intent of the CSLA implementation is for the returned object to replace the original - thus the reference issue. However, this seems like a lot of work.

We can get around the timestamp issue by setting the values in our business objects and writing them to the database rather than relying on the getdate() function. This may also help make the framework more data-agnostic as well, so I'm not too upset about this change. Coupled with using GUID values for unique id's that are generated in our .NET code and this makes logical sense and we don't have to worry about returning a value from the DataPortal_Insert and DataPortal_Update methods as all.

I respect and trust Rocky's experience and knowledge of all these subjects but am still trying to get my head around the whole DataPortal subject. This is why we didn't implement it in our previous version, btw. There has been and is much discussion about this and I have been charged with the task of making sense of it so that I can make the rest of the team grasp it. So far I've been faced with nothing but, "what about [this]" and "what about [that]" questions.

At this point, I am still fumbling with the concepts a bit but am leaning towards using the DataPortal_Command feature to execute our "refresh" sproc which will provide us with the changed resultset that can be applied to the object/collection. But, I am just getting into how this method works.

Thanks for the input so far.

RockfordLhotka replied on Thursday, June 01, 2006

At this point, I am still fumbling with the concepts a bit but am leaning towards using the DataPortal_Command feature to execute our "refresh" sproc which will provide us with the changed resultset that can be applied to the object/collection. But, I am just getting into how this method works.

This is what I'd do - either use a Command object, or more likely a ReadOnly object that is private to my actual business object and use it to do the refresh. Once you get these results you need to merge them into your pre-existing set of data, but that is often far more efficient that re-retrieving a lot of unchanged data.

It all depends on how much data is involved. If you are talking just a couple dozen rows of data I doubt it is worth the effort. But if you are caching some large lookup list that has scores of rows then the effort of doing the merge would make sense.

SonOfPirate replied on Friday, June 02, 2006

Thanks for the affirmation that my thinking was at least headed in the right direction. In reviewing the Command object functionality a bit more, I think this may do the job (at least in the short term) and allow us to follow the same logic that we applied previously.

I also agree that the workload definitely dictates how much to put into such an implementation and, if it wasn't for prior development efforts that utilized these concepts, we probably wouldn't be so hot to incorporate them into our framework. FYI - the application I referred to earlier had over 18,000 records active at any one time with over 50 users manipulating the data at a rate ranging from 1 record every couple of minutes to 20-30 per minute. Given that, reloading the entire collection each time we wanted to refresh was definitely a performance killer and absorbing only the changes each cycle was the only way to make it work.

I think this approach, coupled with the discussion that has taken place in regards to caching under the ActiveCache thread will help us find a more efficient way than we had in our previous version and allow us to incorporate it seamlessly into our new CSLA-hybrid framework.

Thanks for all the useful insight.