OT? Document transfer

OT? Document transfer

Old forum URL: forums.lhotka.net/forums/t/1087.aspx


jemmer posted on Thursday, August 31, 2006

This may be a bit off topic, but since the app in question makes heavy use of CSLA, I hope I can get some advice on a design/technology question here.

The app is a medical claims case management system.  It uses CSLA and it works great.  It is a Winforms app primarily; there is a web component, but that is not relevant to this discussion.

A new requirement has recently surfaced which would require the managment of a large repository of document files. This repository contains several 100,000 files of various types and sizes.  The files arrive by a variety of means, such as scanned documents, fax images, file uploads, etc.  Two file types are TIF image files and PDF's, both of which have files which can become 200MB or more in size.  Most files are considerably smaller, though.

Desktop users have the need to view, print, edit (e.g Word docs) or otherwise access one or more files from this repository.  They will also add files to it.  For security and audit reasons, the access to these files must be tightly controlled. All operations are logged.  For certain files, editing of their contents is allowed.  A user must "check out" the file for editting and then check it back in when they are done. The modified file will be added to the repository as a new version. While a file is checked out, no other user may check out or otherwise access the file, though they may view prior versions.

Thus, there is a need to be able to efficiently transfer files both to and from a user's desktop.  These transfers would be mediated, presumably, by some kind of file server/service which would authenticate the user, validate the operation being performed, create any log data, and transfer the file to/from the appropriate directory on the server.  These requirements seem to suggest that business objects running on the client will cooperate with objects on a server somewhere to record the information as appropriate as well as effect the transfer of the files themselves.

We discarded the idea of making the files available via direct access to a network share, since that would violate the security requirements - we can't have users messing around, outside of the app, in the repository directory tree.  Though I think that would be by far the simplest (and fastest?) approach, we cannot allow direct access to the files; all access must be monitored and controlled. Indeed the users aren't really aware that there are files at all - they deal with cases and the case's supporting documents.  They don't know or care what the filenames are.

We have toyed with the idea of using a Web Service for this.  The idea is that web service methods could be called with appropriate arguments for authentication as well as the operation being performed.  For the file involved, a byte array by reference could be used as a argument to the service call.  The byte array would "be" the file, and it would then be written as a temporary file on the user's local machine, or in the case of an upload by a user to the server, written to the appropriate server directory.

We have developed some proof of concept code and it seems quite straightforward.

But, the problem with this approach is, I think, the large files. While there aren't many of them, there are enough of them to force us to deal with them.  Using Web Services means the byte array is serialized into an xml stream, increasing the size by, what, 50%?  That is a significant overhead. Also, that would mean that the web site running the service would require that 200mb byte array to be resident in memory while being serialized and transferred, and if we had more than a few users doing that I suspect the web server would be overwhelmed.  Indeed, in some of our tests we have had "Insufficient Resource" errors on the server when using a Binary Reader to load a large file into a byte array in preparation for returning that array to a caller.

Does anyone have any thoughts on how to do this?  Perhaps some sort of custom remoting to transfer the file?  If the remoting were hosted in ISS (like the dataportal), then wouldn't the same resource problems exist with the large files?  I saw an article somewhere (in the MS KB?) that showed how to write a service which would host the remote object, but isn't there still a problem with transferring 200MB in one big chunk? How would breaking a file into smaller chunks work using single-call remoting and how would that file be reassembled on the user's system?

Or maybe somebody has an idea for some other approach entirely?

Thanks for any help or insight anyone can offer.

 - jeff

CaymanIslandsCarpediem replied on Thursday, August 31, 2006

We have much the same requirement (though don't have any files approaching 200MB).  Our solution was to use SharePoint Portal Server.  Out of the box it sounds like it will meet all your requirements but if its logging isn't exactly what you need its customizable or even better there are 3-rd party more sophisticated logging options as well.  Again I haven't tested using the SharePoint web services with huge files but presumably MS has this figured out.

 

JHurrell replied on Thursday, August 31, 2006

Jeff,

I have an idea but I don't know if it's a good one. It does allow you to maintain security, log file access requests and maintain files on a network share for faster access.

Versions of the files are maintained on a server and the locations of those files are stored in the database. End users do not have access to those files. Only the appropriate systems and services do.

When a user requests a file, the system first verifies that the user can access the file. If the user has access, the system generates a temporary name for the file and copies it to a share accessible to the user. The system also logs this, and includes an expiration date/time after which the file is no longer considered accessible.

The user is provided with a download link to that temporary file.

You would have a process that executes on a schedule and deletes those files that have exceeded their expiration date.

That was just off the top of my head but it could work. it would be fairly simple to set up and would definitely be more friendly to your systems.

- John

Edit: The one caveat to this is that once the user has the link, they could give that link to another user who should not have access to the file. I don't see where this is too bad though since the user could just give the downloaded file to another user anyway.



jemmer replied on Thursday, August 31, 2006

Hi John,

I like your idea.  But I'm thinking that instead of a link, I would use the web service, which "checks out" the file (and currently returns the huge byte array) to continue to do what it does now, but instead of returning the byte array, it instead copies the file to the accessible share, and returns the necessary information for the client code to access that file on that share. The client code would then in turn copy it as a temporary file onto the local hard disk for whatever application use is appropriate.  That way, no external link information is available to be exchanged between unauthorized users.  Users only interact with the web service, by way of the app, to obtain access to that file, in either direction.

You are correct about the users sharing files.  That's OK, because although we cannot prevent one user from showing or giving a file to another, since we log everything, we can identify who changed things, and that's what's really important.  We also maintain an access log so if we had to track down some unauthorized use, presumably we could do that.

We had already contemplated a service/scheduled task to monitor abandoned checkouts - it could also easily handle the file cleanup in the accessible share.

Thanks for your ideas!

 - Jeff

JHurrell replied on Thursday, August 31, 2006

That sounds like a great idea.

It's much better and more secure to use a LinkButton or ImageButton or something that does a Response.BinaryWrite() behind the scenes to deliver the file to the user.

- John




jemmer replied on Thursday, August 31, 2006

Yes, but remember, this whole thing is a winforms app.  No Response objects nor web site to be found anywhere...

 - Jeff

Copyright (c) Marimer LLC