To use or not to use DTOs

tiago posted on Wednesday, October 12, 2011

Hi Rocky,

On Using CSLA4 - 3 Data Access ebook you show two implementations of Encapsulated Invoke model:
- DataReadder DAL interface
- DTO based DAL interface

As far as I can see, the former uses a data reader but just for the fetching part and uses parameter passing otherwise. The later uses DTOs for all operations.

I understand that passing too much parameters is against the (good) practices as it becomes difficult to read the code. Code Complete says:
Limit the number of a routine's parameters to about seven. Seven is a magic number for people's comprehension (p. 108).

Using a DTO is the way to avoid exceeding this magical number. It's easy to have objects with more than 7 properties; 5 "payload" properties plus Id and timestamp. Talking of Id and timestamp, on object creation we need to pass them back from the DAL to the Business Layer. Unless we use a DTO, our Insert method on IEmployeeDal must return an EmployeeReturn object that holds two properties: EmployeeId and EmployeeRowVersion (timestamp).

For Insert and Update methods, using a DTO is mostly a matter a parameter counting. The Delete method doesn't need a DTO as you show in the EncapsulatedInvokeDto sample.

I understand your ebooks have an illustrative purpose so this isn't a criticism.

Now comes the question (or at least the debatable PoV)...

What about the Fetch method? Suppose my SQL query fetches one root parent, a child collection and also grand-children collections? The best choice wouldn't be to use a DTO but to stick to a DataReader as using the DTO just adds complexity.

My point is: use a DTO only if you need to pass more than 7 parameters from the business layer to the DAL.

If you don't have that many parameters to pass to the DAL and need to return more than one value from the DAL, use a result object. Strictly speeking, there is no need to use a DTO in that case.

RockfordLhotka replied on Thursday, October 13, 2011

There is no doubt that the DTO approach adds overhead. But it also adds abstraction and clarity.

This means it is a trade-off decision that you have to make. That is why the ebook shows both techniques, so you can evaluate the options and choose what works best for you.

I really like the datareader approach, because it offers good performance, and I don't think it adds that much complexity.

But if I think I'll need to support non-relational data stores (xml files, Excel documents, web services, etc) then I'll almost always use DTOs, because their increased abstraction makes it easier to write and understand the code.

tiago replied on Thursday, October 13, 2011

RockfordLhotka

Point taken.

StefanCop replied on Thursday, October 13, 2011

A month ago I has been asking myself the same. I added a switch to the object factories and run several performance sessions. The scenario might not be the most typical one, but the best a had at hand: Fetch about 20'000 Persons with their Adresses (usually 1 per Person), afterwards the tests does lazy load the other contact mechanisms (phone, email etc), which are 11'000 rows in the DB.

Fetching the 20k person data counts 24k samples, which includes 14k samples to fetch the adresses for each person.

Fetching the contact mechanisms per person counts 25k samples.

Creating the DTOs and fill them with possibly converted data (Convert.ToXX) takes max. 3000 (~6%).

On the other hand, a better loading strategy could easly half the work; unnecessarily use of some Set/GetProperty instead of Load/ReadProperty (or BypassPropertyChecks) costs 2% or more.

In my scenario (a lot of Fetch invocations), the top methods have been:

- GetCustomAttributes() 16%

- GetType() 14%, before I provided an ObjectFactoryLoader as explained in the ebook DataAccess.

- ExecuteReader() 11%

My lessons learned:

- I'll stick on DTOs, because it simplifies code, especially loading root with children and gand-children.

- I was a little surprised how much the authorizations checks of Get/SetProperty costs when Read/LoadProperty cost (possibly filling a cache is overweigthed in my scenario).

- and of course design the loading strategy / object graphs properly