Techniques for working with large number of objects?

Techniques for working with large number of objects?

Old forum URL: forums.lhotka.net/forums/t/5517.aspx


rsbaker0 posted on Thursday, October 02, 2008

So, I'm implementing an object transformation that involves changing basically every object in a particular table. It can't be done with a single direct script, and stored procedures are out. The entire update must happen in a transaction.

The naive implementation is just to load a BLB, apply the transformation to each object in the list, and then save the list. However, in the extreme case I could have 100,000 or so objects.

When I populate a BLB, (or ERLB for that matter), I'm working with IDataReader driven object factory, so in theory I could basically duplicate this logic and just "scroll" through the objects once, updating each one as a go, rather than loading the entire list into memory first. This seems like a better approach.

How have other folks handled similar situations?  I know that you could also do this in a stored procedure, but one of back-end databases doesn't support these.

 

 

sergeyb replied on Thursday, October 02, 2008

My first instinct it to create a query and fire it to back end.  I think that any other solution will be many times slower.  Usually, whatever you can do in stored procedures you can do in a dynamic script.  I would be weary of fetching 100,000 rows just to send them right back.

 

Sergey Barskiy

Principal Consultant

office: 678.405.0687 | mobile: 404.388.1899

cid:_2_0648EA840648E85C001BBCB886257279
Microsoft Worldwide Partner of the Year | Custom Development Solutions, Technical Innovation

 

From: rsbaker0 [mailto:cslanet@lhotka.net]
Sent: Thursday, October 02, 2008 2:08 PM
To: Sergey Barskiy
Subject: [CSLA .NET] Techniques for working with large number of objects?

 

So, I'm implementing an object transformation that involves changing basically every object in a particular table. It can't be done with a single direct script, and stored procedures are out. The entire update must happen in a transaction.

The naive implementation is just to load a BLB, apply the transformation to each object in the list, and then save the list. However, in the extreme case I could have 100,000 or so objects.

When I populate a BLB, (or ERLB for that matter), I'm working with IDataReader driven object factory, so in theory I could basically duplicate this logic and just "scroll" through the objects once, updating each one as a go, rather than loading the entire list into memory first. This seems like a better approach.

How have other folks handled similar situations?  I know that you could also do this in a stored procedure, but one of back-end databases doesn't support these.

 

 



rsbaker0 replied on Thursday, October 02, 2008

sergeyb:

My first instinct it to create a query and fire it to back end.  I think that any other solution will be many times slower.  Usually, whatever you can do in stored procedures you can do in a dynamic script.  I would be weary of fetching 100,000 rows just to send them right back.

I don't disagree -- it's certainly not something I would do often. However, in this case, I'm changing the encryption method for the passwords for the entire user table, and the encryption algorithm cannot be easily implemented in the database.

The legacy application we're porting does this via cursor, scrolling through the user records and updating each one.

 

sergeyb replied on Thursday, October 02, 2008

I see…  I would be inclined then to chunk the work somehow so that you do not have to fetch all 100,000 rows.  Depending on logging schema, how wide the table is,  and transaction support this may take quite some time and a lot of resources.  Maybe like “Select Top 1000 * From users where IsPasswordConverted=0” and put that inside a loop.  Just an idea…

 

Sergey Barskiy

Principal Consultant

office: 678.405.0687 | mobile: 404.388.1899

cid:_2_0648EA840648E85C001BBCB886257279
Microsoft Worldwide Partner of the Year | Custom Development Solutions, Technical Innovation

 

From: rsbaker0 [mailto:cslanet@lhotka.net]
Sent: Thursday, October 02, 2008 2:31 PM
To: Sergey Barskiy
Subject: Re: [CSLA .NET] RE: Techniques for working with large number of objects?

 

sergeyb:

My first instinct it to create a query and fire it to back end.  I think that any other solution will be many times slower.  Usually, whatever you can do in stored procedures you can do in a dynamic script.  I would be weary of fetching 100,000 rows just to send them right back.

I don't disagree -- it's certainly not something I would do often. However, in this case, I'm changing the encryption method for the passwords for the entire user table, and the encryption algorithm cannot be easily implemented in the database.

The legacy application we're porting does this via cursor, scrolling through the user records and updating each one.

 



rsbaker0 replied on Thursday, October 02, 2008

Ouch! We all know what happens to the best laid plans...

It turns out that (at least with SQL Server 2000), if you have started a transaction and open a DataReader on it, then you can't do any updates -- or really practically anything else in the transaction actually -- as long as the DataReader is open.

At least this is what I appear to be seeing, at any rate. This would seem to rule out any sort of cursor-style approach to this.

In theory, I could scroll through the objects in a separate database transaction, but that seems perilous.

Your chunking idea is a good one, but I would need to add something to the schema. I'll look at doing that down the road. Thanks!

jemmer replied on Thursday, October 02, 2008

rsbaker0:

It turns out that (at least with SQL Server 2000), if you have started a transaction and open a DataReader on it, then you can't do any updates -- or really practically anything else in the transaction actually -- as long as the DataReader is open.

Not quite true, actually.  You cannot do anything on the connection while the reader is open (and that has nothing to do with whether there is a transaction started or not), but there is nothing preventing you from opening another connection to do something else...

 

Fintanv replied on Thursday, October 02, 2008

In the past I had to do a rather large data dump from a complex csla hierarchy to the database.  What I ended up doing as using the .net bulk update functionality for sql server to write the data to a staging table with an app generated guid as the pk.  I then invoked a server side proc (passing in the guid) to handle the transfering of the data from the staging table to the main database table(s) and do cleanup on the staging table. 

detritus replied on Thursday, October 02, 2008

An extended stored proc (xp_...) should help you there. That was the only method to code other than sql before sql2k5, or if this is a one time only operation do this with a CLR in sql2k5 and move data back to sql2k.

Sinan

Copyright (c) Marimer LLC