Use ActiveCache to minimize transfered data?

Use ActiveCache to minimize transfered data?

Old forum URL: forums.lhotka.net/forums/t/20.aspx


Patrick posted on Monday, May 08, 2006

Hi,

how would it be possible to work with the following caching scenario in CSLA.NET?
  • Lets say I would like to load a very big customers collection with their photos from the Data Portal via a Web Service.
  • Now I change some of the customers on the client and save the changes. The client only sends back the changed customers to the Data Portal.
  • Now I would like to refresh the customer collection on the client but I would like the Data Portal to only send changed customers (updated, deleted, inserted) back and not all the customers again (to save bandwidth).
Is it for example possible to use ActiveCache from ActiveObjects in this scenario or are there any other tools some of you use?

Thanks a lot
Patrick

glenntoy replied on Monday, May 08, 2006

Hi Patrick, I would like to help and give you idea but we have different cases. In CSLA 1.0,  I was able to implement a caching of lookup tables. What I did is to do a binary serialization and save the object  to client's workstation. Everytime it runs the application, it compares the table cache file against the table cache in the database to check what are the necessary lookup tables that where modified. If there are no changes it deserialized the saved object ad load it into csla object. Oherwise it deletes the saved seralized file and retrieves a new one in the server. It was implemented with .NET Remoting.

But in your case, you want only to have the changes (updated/inserted/deleted), I'm not quite sure if it will be an efficient way. I suggest to use TCP/UDP and incorporate it with your application

When a user login and use the application, it saves the PC NAME or IP address (to a table named tblCustomeSessions) that are currently using the customer module. Create a trigger or incorporated in your stored procedure in database that everytme there is an update/edit/delete it will send a UDP message to the IP addresses that record ID 12345 - DELETED/INSERTED/UPDATED

The customer module then receives this message and try to parse the message for further action. if it was an insert/update it retrieves the record in the database and updates the collection only otherwise if it is deleted then it will only remove in the collection.

Cheers,

Glenn

P.S. I'll be delayed in my replies cause we are from different timezones.. ;-)

 

Lakusha replied on Monday, May 08, 2006

 

What we do is a bit differently:

All tables have columns about the last update time and who did it (useful for many things like auditing and replicating with other systems).

The client application can ask for all records modified since its last cache update. 1 table maintain the max update timestamp for each cached table; you can keep it up to date easily if you are using stored procedures for your CRUD operations, or you could use triggers or even a materialized view. To minimize contention on that table, try not to update that table as part of a larger transaction; it should be a post-commit step. It is an atomic operation with no integrity dependency (meaning that updating without reason would not cause any problem other than forcing client apps to check for updated data).

With .NET 2.0 (in 2 tiers) you can also use ADO notification to know that an object has changed.

guyroch replied on Monday, May 08, 2006

Take a look ar this post from Andrés (a.k.a xal).  It might be worth you time.

xal replied on Monday, May 08, 2006

It wasn't me!! I SWEAR!! :P
Now seriously, what post!?

I'm not sure I've ever posted anything on that subject, but anyway, here's what I think:

Are you using a read only collection for this list? I hope you are...
So, saving different items in that case would mean that you're using editable root objects (not collections), and that you edit / save them one at a time.

In that scenario, if it really is so critical to you that all your other clients recieve notification of this update, then you should consider having a service that your app can talk to. In there you could implement something like the observer petar created, but in a client server fashion, where you can trigger events in a "server channel". There is some complexity in creating all this, but it's not terrible. Using it should be as simple as one line of code. The big issue comes with implementation and clients requirements.... but that's a whole other subject...


Andrés


SonOfPirate replied on Tuesday, May 30, 2006

I may be a little late to the topic, but here's my two cents nonetheless...

We have incorporated multi-level caching into a hybrid-CSLA framework.  With our strategy, caching is handled within the DataPortal mechanism as an alternate data source for objects versus Activator.CreateInstance.  Depending on how the specific application is setup, we have a switch to a) enable/disable caching altogether (when disabled it behaves just as the DataPortal is described in the books) and b) indicate if caching is done locally, remotely or both (on the client and/or on the app server).

If caching is enabled and intended to run locally (either local only or in conjunction with the app server), the client-side data portal will attempt to retrieve the object from the local cache before deferring the request to the data portal proxy.  So, if the object exists in the local cache, you get the expected performance benefit and none of the data portal code executes.  If it cannot find the object, it defers to the proxy.

If caching is to run remotely (or both and there is no local copy), the server-side data portal will likewise attempt to retrieve the object from the cache on the app server.  Only if it is not found do we create a new instance of the object using Activator.CreateInstance and pull the data from the data store (DataPortal_Fetch).

On the flip side, when the object is created using Activator.CreateInstance and data pulled from the data source, before returning, the server-side data portal will check to see if a copy is supposed to be cached on the app server and, if so, will do just that.  The same will be done by the client-side data portal so that under all possible configurations the object will be cached in its proper place.

The benefit of dual caching is realized with multi-user/multi-client applications.  Under these scenarios the client application will be running on multiple physical boxes that are all accessing the same app server so the copy cached there will be retrieved by any of those clients that do not already possess a copy of their own.

Any time a change is made to the object and persisted to the data store (via the Save() method, for instance), the data portals know to update the cached copy accordingly.  For new objects, the object is added to the cache where applicable, updated objects have their cached copies replaced with the new version and deleted objects are removed from cache.  This eliminates any need for any bubbling up from the data store to make sure that we've updated the cached copies.

The expiration policies attached to each object are used to ensure that the data is as current as possible and only data that can safely be cached are handled this way.  Typically we set shorter expiration periods for client-side caches and longer for server-side as the server-side copy will have been updated automatically whenever any other user applies a change to the object.  When the client-side object has expired, it will be refreshed from the server-side cache.  Under optimal conditions, the only time the data store is used is when changes are applied.  For infrequently changing objects, this amounts to a significant performance gain.

We have a CacheAttribute that defines settings on a per object basis and our objects and collections have Refresh() methods which force the object to retrieve its data from the data store to update both the cached and in-memory copies of the object.

As far as tools, etc. - all of this is based on extending the CSLA framework and the caching features provided by the MS-Enterprise Application Blocks.  As you can see, however, there is no provision to partially cache or partially update a cached object under this strategy.  But, because it operates so transparent to the user (& developer) and the overall performance gains outweight the hit when updating a few large data sets, we couldn't be any happier with the approach.

Hope this helps in some way.

 

Patrick replied on Wednesday, May 31, 2006

Hi,

thanks for all the helpful responses. I like the idea very much from SonOfPirate to build a client and AppServer cache . It seems quite possible to connect this with the possibility to do partially updates. The client DataPortal could just send the dirty business entities back to the AppServer on Updates transparently.
And when a cached collection issues a refresh command it would be possible to just get the changed rows from the database (using Lakushas idea with a last changed column in each table). Each collection would have a field with the timestamp of the last time it was updated.

Thanks again
Patrick Smile [:)]

marklindell replied on Tuesday, July 25, 2006

That is an aggresive approach caching if you are attempting to update middle tier cache based from client requests.

This all works fine until you attempt to load balance your DataPortal.  This goes against the "S" portion of CSLA and I would not recommend it.

Developing caching systems for updatable data is not a trivial task.  Start with client-side read-only list caching before examining other senarios.


Mark Lindell

Patrick replied on Thursday, May 08, 2008

SonOfPirate:
I may be a little late to the topic, but here's my two cents nonetheless...We have incorporated multi-level caching into a hybrid-CSLA framework. 

I'm even later Smile [:)]... If you are still around the forum... It's still an interresting topic for me and so I would love to get in contact and chat about some of the details of your solution.

Thanks a lot,
Patrick

tymberwyld replied on Sunday, July 30, 2006

Hi, I may be a little late in this subject, but this actually has absolutely nothing to do with Caching.  You can get the most current information from the database after saving a DataRow back to the database in ONE network round-trip.  This all depends on what side the data is saved from (Client vs. Server).  Here's how it's done:

1.)  You need to setup a Stored Proc for saving you data.  I typically do not create different Procs for "Insert" vs. "Update", I usually just create one "spSaveCustomer".  Keep in mind that this Proc will be used for saving one Row only.  At the very end of each Procedure you need to write, you'll place a select statement that returns the current data for the Row.  In most of our designs, the Data is retrieved from a View on the Database, the Proc needs to update the table with info, and the the Proc returns the current info from the view for the current row being updated.

ex.  Select * From vuCustomers Where ID = @ID

2.)  You'll need to setup a DataAdapter for saving your Data into the Database.  In the DataAdapter, your Insert and Update Commands will be exactly the save, so typically I initialize the InsertAdapter and then set the Adapter.UpdateCommand = Adapter.InsertCommand.  These Commands will be of Type "StoredProcedure" and their CommandText will be your Stored Proc (i.e. spSaveCustomer).  Next, you'll go through the process of setting up the Parameters for the command.  We have this all built into our DAL so it's actually one line of code for all this...

3.)  Here's the important part.  The DataAdapter's Commands have a Property called "UpdatedRowSource".  You'll want to set this to "FirstReturnedRecord".  There are other options and by default it's set to "OutputParameters", but really, just doing Output Params is not enough in all scenarios.

this.dbAdapter.InsertCommand.UpdatedRowSource = UpdateRowSource.FirstReturnedRecord;
this.dbAdapter.UpdateCommand.UpdatedRowSource = UpdateRowSource.FirstReturnedRecord;


Now, you're all set.  Becuase the "UpdatedRowSource" on your Commands is set to "FirstReturnedRecord" and the last line in the Procedure to run is "Select * From vuCustomers Where ID = @ID", you're DataRow will be auto-magically populated with the current info from the DB without and additional Round-Trip or even using the DataAdapter's "Fill" method again.

I hope this has helped...

SonOfPirate replied on Sunday, July 30, 2006

You have aptly described how the basic framework data classes, DataAdapter/DataSet, work and are absolutly correct that you can easily track which rows have been changed and take advantage of some of the automatic wiring within these objects.  However, there are several boats that you've missed with this.

First, the point of the thread was an inquiry about using cache to reduce the number of round trips to the database.  In particular, a Refresh function to update a large data set after changes are made to some of the items.

Second, the use of Mobile Business Objects which is contrary to using DataSets and DataAdapters, etc. as these objects work on collections and data "sets".  MBO encapsulate all of the data access logic so that the individual objects manage themselves.

Third, the question is not how do we get the data to/from the data base or recognize what has changed, but what do we do with the changes and how do we refresh our local/client objects with the returned data.

While we have focused on and bantered how caching could be implemented to eliminate round-trips and network hops to retrieve data, we have lost sight on the real issue here.  I will refer you to a number of other threads that hash out the subject of refresh given the data portal mechanism as provided by Rocky in CSLA.  The issue that has led to these other threads as well as the basis for the original post is the need to be able to update the objects on the client to accurately reflect changes made to the database.

It seems to me that reviewing Patrick's original points may help boil this down:

It sounds to me like the question really has to do with refreshing the customer collection on the client.  The reason for this is that using the CSLA framework, only those ITEMS that have been changed in some way are sent back to the database via the data portal (#2).  This occurs, presumably when the customers collection's ApplyChanges (or whatever name) method is called.  The behavior provided by CSLA is very much like what a DataSet does in that it iterates through its items (rows) and executes whatever operation is necessary on each object (if it's new: Insert; removed: Delete; modified: Update).  This passes control of the actual operation to the individual objects and leads us to where the problem arises.

When we execute a data portal method, we end up with a second copy of our object in the data portal's return value.  As a result, the customers collection hold references to our original, pre-data operation objects and somehow needs to be updated with the copies that were returned from the data portal.

As I mentioned, there are a number of other discussions on this topic that you may refer to for more discussion on this point.  But, if you narrow the scope of this question down to what it is really about, we can focus on this same issue as the cause of our concerns.

There are many solutions proposed including Rocky's implementation where the collection should be updated to refer to the new object rather than the old - which is handled by the object itself once the new object is received from the data portal.  However, our solution was to implement a protected virtual MergeWith method in our business objects that accepts an object of the same type as the owner.  Using this method, we can "copy" whatever properties we need to from the return object into our original object and preserve all references to that object.  We call this method from our data portal methods and pass the returned object as an argument.  This approach has worked well for us.

A couple of additional thoughts.  This only really matters if there is some new or different information contained in the returned object.  This might be the value of an identity column, timestamp or something set as a result of logic in the stored procedure being used for the operation.  If this is not the case, then the original collection is up-to-date and doesn't need to be refreshed.

The subject of caching, which has shrouded the real issue here, is still valid and can still be used to reduce the amount of network traffic required for the application.  We implement caching on our fetch methods to optionally retrieve the requested item(s) from the cache rather than requiring a round-trip to the database.  We automatically update the cached copy of the object(s) whenever we apply an Insert, Update or Delete operation.  As a result, our cached copy will be concurrent with the database - which is the goal.

Finally, it is my personal opinion that having a SELECT * statement at the end of your Insert/Update procedure is adding undo network traffic rather than reducing it.  If you have a large table or a large view with many columns this will be returning a lot of data that you already have.  My suggestion is to limit this statement to only return columns that have been affected by the procedure.  In our case, we have a datetime field for concurrency checks that is automatically updated by the sproc everytime a record is updated.  This is the only value that is returned (by default) from our Insert & Update procedures.  Our base class implementation of MergeWith(...) copies the returned value to the local object so it remain concurrent with the database.

You can certainly do it the way that tymberwyld has described, but as is said above, you will have to deal with DataAdapters, etc. for this approach.

I certainly hope that clears things up for everyone and helps in some way.

 

tymberwyld replied on Monday, July 31, 2006

All your points are valid.  I probably didn't explain well enough.  My suggestion was not so much for caching (which would be addressed in a different way - DataCachingBlock from MS?  which each "cache" can have a Timeout to determine when it needs refreshed again?).  I was just trying to suggest another way to reduce round-trips.  I now understand that the original question is returning "ALL Customers that have changed since he last retrieved data".  In this scenario he'd have to store a current date everytime he retrieved data and then get all Customers that have a modified date >= last retrieved date.

More or less, I was trying to explain that retrieving a new copy of the data from the Database is easy to do with the built in ADO.NET functionality.  What I didn't go in-dept about is how we manage concurrency checking within the Stored Proc.  Yes, we use a ModifiedDate and we pass-in Current AND Original values of the data for concurrency checking so that conflicts can be Merged as much as possible before a "true" data conflict occurs.  Having the additional "Select * From..." at the end of the Procedure will not cause that much of a performance hit when you're retrieving one record and also you've added a "Set NOCOUNT On" call at the begining of the procedure (which you should be using in every Proc becuase if not, you're causing unnecessary network traffic anyway). 

Also, the reason for doing this may be that the View that the data is being retrieved from may have a Computed / Calculated column that would just be too difficult to implement in a custom business object's property.  Say for example a Property changes on your object that then causes the need for another property to be updated.  For example, I change the First Name of a Contact and now the "ContactFullName" Property needs to be refreshed.  This is a simple example but there are others where a Computed column in a view may come from a totally different table or even another database.  I don't want that kind of logic in a BO.

Also, the other reason for implementing the "FirstReturnedRecord" vs. only "OutputParameters" is that I've personally had situations where the Output params did NOT get updated and after reviewing the T-SQL for hours I could find nothing wrong with it.

This all comes down to 2 implementations:
    1.)  Have the Stored proc return a refreshed copy of the database
    2.)  Issue the Insert / Update and then retrieve the updated rows in another Select command.  Which means you not only have to wait for the Insert / Update to finish, but also the Select / re-population of buisness objects..

This all just keeps re-affirming why I don't like BO's.  Having a strongly-typed DataTable with custom business logic built in is much better (no need to repopulate other objects, the Strongly-Typed DataRow IS the object).

SonOfPirate replied on Monday, July 31, 2006

I agree with your approach with the understanding that we all have subtle nuances to the way we implement the same concepts. You accurately state that this approach uses the built-in ADO.NET features and there is certainly nothing wrong with that. However, you are posting to a forum for and about CSLA which is ALL about BO’s and how to follow a different approach than using the built-in ADO.NET stuff.

I don’t know if you’ve read the book and certainly won’t assert that you haven’t, but Rocky does do a tremendous job of explaining all of the “why’s” for this approach. And while I have not implemented everything the way he has (those subtle nuances), I can’t argue that his reasoning is solid when backed up with thorough explanation most of the time.

All of that being said, again, there is nothing wrong with using the tools built-in and supplied by Microsoft. And, if they are serving your needs, that is great. But, as I said, remember where you are posting and that most users in this forum are not using straight-up ADO.NET for their applications and are making extensive use of BO’s to serve their needs.

Thanks for sharing nonetheless.

Copyright (c) Marimer LLC