Offline synchronizing without Replication

Gravy posted on Tuesday, October 31, 2006

Hi there,

Has anyone any experience of writing (or working with) some sort of offline framework that isn't Sql Server Replication.

Here's what I'm thinking:

Take an application like Outlook. Here we have something that will run offline and still provide most of it's functionality. When the network connection is back up it will seamlessly re-sync to a server. I am planning an app that will be connected most of the time but must when disconnected. The application is an ordering system. So it has will need plenty of reference data such as Customers, Product lists, Suppliers and some other lookup data. When offline I want to be able o create orders, add new customers and new suppliers.

This could be a typical CSLA style application, were we have business objects like Order, Customer, Supplier and they have their corresponding data portal methods to write to a database. What I'm thinking is that this will be a local database (Sql Express or Sql Everywhere) but I need a way of tracking / recording what changes happen to the data or business object so that when my application comes back online I can reply my actions.

The reason I can't use replication is because the 'server' I'm synchronizing to could be Sql Express and that can't be a publisher.

Has anyone done anything like this before.

I have a load of questions going round my head so someone to bounce these off would be great :)

Graham

Bayu replied on Tuesday, October 31, 2006

Hey Graham,

Let me start by saying I don't have explicit experience in actually getting this to work, so I hope someone with experience will join this discussion later. However, we do have a design ready for exactly this scenario and we intend to implement it some time soon. So it may be worthwhile to share the basic ideas of our design and then happily go bouncing together. Wink [;)]

Our design involves:
- setting up a local database, most probably MySql or SqlServer Express 2005
- configure the application so that it exclusively works with this database
- implement a separate app (!) that takes care of all synchronization, this will most probably become a Windows Service or something that starts with windows and appears in the system tray

Then to deal gracefully with synchronization:
- maintain version numbers on master db and local db (simply an integer column)
- use Guids all over the place
- whenever data is replicated from master -> local, the local db also stores the current master-version
- whenever local data is modified the local version is incremented

Then we have identified the following scenarios:

1 - copy an update from local -> master
- when locally stored record is new (i.e. the Guid does not exist on master): insert record
- when locally stored master version equals current master version: overwrite master record
- when master version is incremented since checkout: collision! ask user feedback:
    - use master version, discard local version
    - use local version, overwrite master version anyway
    - merge both version (we allow last-minute editing of local version, so basically it's a variation of 'use local version')

2 - update local db by checking master db for updates
- get all updates from server, but wherever the local version is newer (higher) than the master version, leave local version untouched

3 - special cases (exception)
- this one is nasty: when records on the master server are deleted .... eg a customer-record with all related records. And suppose you were modifying a contact person's details that related to this customer .... in that case your update can simply not proceed, you can not just add the record since the related customer does not exist, so that would become an 'orphaned' contact
- basically this illustrates the concept of aggregations/associations, these relations need to be checked/ensured somehow
- this is one to bounce around, we have no solution for this yet

So, here I join you: "Help! Expert requested!" Wink [;)]

Regards,
Bayu

guyroch replied on Tuesday, October 31, 2006

There is no magic solution here! You'll have to code your own synch algorithm. In my database _all_ of my pk are Guid (single column, always and no exceptions) and all tables have the following 3 fields.

CreatedDate

ModifiedDate (same as CreatedDate on initial insert)

RowVersion

The RowVersion is an integer column, which I set to 1 when a row is inserted. When the row gets modified I increment RowVersion by 1. When the value gets to the maximum value I reset it to 1. CreatedDate and ModifiedDate are self explanatory.

I use the RowVersion for database concurrency. When I do an update I always check for this value also.

UPDATE MyTable

SET

Column1 = 10,

Column2 =’Test’

WHERE

PkGuid = <<my_pk_guid>> AND

RowVersion = <<rowversion_when_row_was_fetched>>

This way, if somebody else modified the row I will get 0 row updated. This means somebody either deleted the row OR it was modified by someone else between the time I fetched it and the time I tried to save it.

This might seem a lot, but I use code generation and all that code is generated. When I add or modify a database table, I run my code generator and within seconds I have all I need to communicate with the database once again.

Then _you_ could use the ModifiedDate to synch up you database. Or course you will have to make sure the order in which you process your tables is such that you will not break foreign key constraints.

I will need to do some data synch in a few months myself as I too have a requirement to bring the application offline from time to time. But I’m not there just yet. However, the implementation of the 3 columns above means that I have all I need to make it happen. So I’m not terribly worried at this time. Let me know how it goes.

Gravy replied on Wednesday, November 01, 2006

Hi Bayu,

thanks for your comments.

When I first started thinking about this application I was quite interested in learning about the offline application block that was rpoduced by Partterns & Partices in Microsoft. Here they talked about two basic levels of synchronization. 1. Data centric, i.e. replication and 2. Service centric. As I can't use replication here I started to focus on service centric. The basic principle is (roughly) that operations when offline are queued up in an offline queue, waiting for a reconnect. When the network is back up the queue is replayed on the master copy.

Then my thoughts went on to CSLA. Immediately I ran in to a conflict. Service centric using CSLA?

My interprtation of service centric went through many changes but I eventually came round to thinking the following:

1. The application will talk directly to a local Sql Express / Everywhere database.

2. In the data portal methods of the business objects I will not only update the local database but I will record (in a queue or a table) the action performed with the data. So for example, if I inserted a new customer then the local database would be updated and I would record a command to insert the same customer in the master database if it was configured to have a master.

3. There would be some other part of the application framework that would take the commands from the queue and forward them on to a web service when a network connection allowed.

The isn't a substitute for replication so I'm doing the synchronization at business functionality level.

This is still very early stages and I am only playing with idea's so I'm sure there will be loads of holes in this approach but it's good to talk eh?

Graham

Bayu replied on Wednesday, November 01, 2006

Hey,

I like where you're going. By making it part of your business logic (i.e. your dp_methods like you suggest) you can neatly build up a queue indeed. My suggestion was kind of brute-force, your proposal has much more appeal.

Then what about the queue ...?

- you suggest creating a table in your database for this purpose. I'm not too keen on that, too accessible and vulnerable. And how to model your commands in a relational structure? Simply store the raw sql in a varchar field maybe .... well, it could work I guess.
- there is another thing with the 'table'-apprach that may need improvement: you local table will need to be communicated to the server from time to time. So you would have to implement something like a windows service that checks your network connectivity and submits stuff when the machine gets back online. Then on the server your need some stuff (web service perhaps) that processes requests from your windows service when they come in. That's a lot of programming ....

So let me coin another option, just to bounce it ...
- wouldn't it be elegant to make use of Messaging? You would setup a message queue locally and another one on your server and you configure these two to cooperate (i.e. local queue forwards messages to server queue). In your dp_methods you enqueue whatever data is needed in your message queue, and your message queue takes care of getting it to your server whenever the machine is online. The great plus here is that you get synchronization for free ...
- then on your app server you would still need a process to empty the 'server queue' from time to time. You can do this in different ways: write a service that processes your queue overnight or you could make something that listens to your queue and processes messages as soon as they come in.
- the great disadvantage of messaging is that it is inherently asynchronous. So there is not a means to query the user for immediate feedback. What can be done though is to setup a twin pair of message queues, but then one that moves messages from server to client. This would enable the server to communicate the success or failure of certain actions.

There is much more possible with messaging than I know of. I have a book on this and I was pretty much thrilled by the capabilities. I believe you can also configure your Messaging to follow some accept/reject model, where messages are only removed from the client when the server acknowledges successfull processing. But this I would have to look up.

Btw: the book I refer to is Matthew MacDonald's 'Microsoft .Net Distributed Applications: integrating XML Web Services and .Net remoting'. Despite what the title suggests it also features a chapter about Messaging.

Bayu

Gravy replied on Wednesday, November 01, 2006

Bayu,

I share your concerns with the idea of storing the commands in the database, but (ignoring MSMQ for now) if we didn't use the database then we must persist them somewhere like the file system or isolated storeage - perhaps this could be configurable?

The MSMQ idea does sound too good to be true ;-). I wonder why the offline application block never mentioned this, is it because of the asynchronisity and the complexity of making it appear synchronous?

Either way, I would seriously consider storing the commands in the client queue by simply serializing to business object, so if we were to use a database table then have the data column either XML or binary.

One other requirement I want to put in: I think being able to distribute the client via ClickOnce would be a seriously good idea. Can MSMQ queues be created on the client when it is installed via ClickOnce?

One slightly different approach is to use Remote Data Access in Sql 2005. I've just read something on this and thought it could be a contender. You know when they say a little understanding of something can be dangerous, well I could be suffering from it here!

Graham

Bayu replied on Wednesday, November 01, 2006

Hey,

Just re-read that chapter on Messaging, it's very inviting. ;-)

Among a database, your filesystem and isolated storage the database is surely preferred for storing stuff. I've never worked with isolated storage, it allows you to work in non-fulltrust situations and I also know that there is a Environment variable for this that you can access from your code, but that's about the stretch of my knowledge. Tongue Tied [:S]

I don't know what Messaging is not referenced in the offline AB. I do know now (after reading it up again) that it is a highly capable toolkit. The things I mentioned like acknowledging successfull receipt, dealing gracefully with offline clients are indeed fully supported. But there is much more to it, you should really read some dedicated book or online article about it to realize the full scale of possibilities.

Yeah ... ClickOnce is nice .... but has its limitations. Check this presentation for a nice overview of deployment options (beware that is is slightly dated, so not everything may still apply): http://www.novicksoftware.com/Presentations/distributing-smart-client/distributing-smart-client-applications-novick.ppt

I read in my book that it is possible to create and delete message queues from your code, so maybe that could be made to work with clickonce. But what about strong-naming your CSLA assembly on the client machine? I'm not sure if clickonce supports this, but you might be lucky. The linked presentation suggests that of all options a manual first installation paired with automatic updating (he proposes the app-updater AB for this) would be favored.

I have no clue what Remote Data Access is. It sounds MS-ish, such a trivial name .... makes me wonder what's the new thing in there ... isn't most of the data access remote by nature anyway you know, these guys really invent cool names ... ;-) No kidding, what is it?

Bayu

Bayu replied on Wednesday, November 01, 2006

Gravy:
Either way, I would seriously consider storing the commands in the client queue by simply serializing to business object, so if we were to use a database table then have the data column either XML or binary.

This triggered another thought: when you rely on serialization you could run into versioning trouble. Especially when you allow for automatic updating, i.e. you are not in full control of who runs which version ....

Using any other approach to communicate changes (sql-based, xml webservice, MSMQ message) might better isolate you from potential trouble with strongly versioned assemblies.

Also: to communicate something like 'DELETE Customer with id 39' it may be a overkill to communicate the full BO.

Bayu

Bayu replied on Wednesday, November 01, 2006

Gravy:
Good point about the overkill of communicating a Delete.

As this deign follows the OAB block slightly, I think I might try and find out why MSMQ was never used. Perhaps it was just minimizing dependancies.

I'll let you know if I find anything.

I think you missed something, my first hit on Google shows that the Offline AB does have MSMQ support ....?!

http://msdn2.microsoft.com/en-us/library/ms998450.aspx

Apparently the roads we thought were heading in different directions do head in the same direction after all.

Bayu