How to improve data access performance for large object graph?

Michael posted on Wednesday, July 20, 2011

Hi everyone

We are trying to improve our application's performance when accessing a database over a VPN (Linq to SQL). It is not uncommon for a save to involve several hundred child BO's each calling SubmitChanges() within a single transaction. Over a VPN, this can take more than 60 seconds, compared to one or two seconds over a LAN.

These are large object graphs, with thousands of child objects. A real-world BO I'm testing from our live database has a serialised size of 43MB, so an application server will not help in this case.

At this stage I've had to increase the transaction timeout to three minutes because the saves were regularly timing out. Is there anything I can do to improve the performance?

Kind regards
Michael

JonnyBee replied on Wednesday, July 20, 2011

Why would you have each child BO call SubmitChanges?

Every time SubmitChanges is called your app will generate SQL and tak to the DB.

You can "accumulate" a lot of changes and call SubmitChanges just one time.

Michael replied on Wednesday, July 20, 2011

Hi Jonny

Before posting the question, I had the same thought, so I set up a test and changed the data access to only call SubmitChanges once, instead of 400 times. It made no difference at all: Both saves took 26 seconds over the slow VPN. So, I concluded that each row insert/update was still being fired off independently inside SubmitChanges, effectively making no difference.

Kind regards
Michael

JonnyBee replied on Wednesday, July 20, 2011

Did you run a memory profiler and database/sqlserver profiler as well to see which functions/code lines/sql's that were using time?

The only other option I can think of is to create a DiffGram of some kind (just the changed data) and send over to a server process (webservice or remote data portal) for updates to the database.

Or perhaps someone here has other suggestions?

Michael replied on Wednesday, July 20, 2011

I didn't run any profilers. I have thought about sending only the data which has been changed instead of all the fields in each row. However, the aforementioned test was only doing inserts.

RockfordLhotka replied on Thursday, July 21, 2011

So you are running the DAL code such that the interaction with the database goes over the VPN?

I would recommend that you consider putting an app server in the same physical data center as the database server, and use the data portal to cross the VPN instead.

Generally, database interactions are very chatty (as you note in one of your posts). This means high latency network scenarios like a VPN make this type of interaction very slow.

The data portal interactions are usually not chatty - instead, they are chunky. One larger chunk of data is sent through, instead of lots of smaller chunks. High latency network scenarios have almost no impact on chunky communication.

Michael replied on Thursday, July 21, 2011

I had thought about an app server, but doesn't the entire business object need to be serialised and sent to to the app server? Our object graphs are very large, with serialised sizes in the tens of MB.

Our application is an AutoCAD add-on, and our BO saves are tied to the DWG document's save event (what's in the DWG and database should match). So our "model" object can have up to 1000 "duct" records, each with child collections as well, so it could easily be 5000+ objects. Most of the time there would only be a few dozen objects which have been touched between saves (if the operator is saving frequently), but sometimes it can be several hundred.

I think my next experiment will be to only send update the fields in the row which have been touched. This obviously won't help with new records, but hopefully it helps reduce the overall save times.

RockfordLhotka replied on Thursday, July 21, 2011

I see what you are saying. The core issue remains consistent though - chatty interaction over a VPN (or other high latency network) is problematic. You need some technique that allows a single call to cross the high latency network, so some processing can occur in the data center so the chatty database interaction occurs on a low latency network.