Sample project returns references to private data - violating encapsulation

Sample project returns references to private data - violating encapsulation

Old forum URL: forums.lhotka.net/forums/t/2368.aspx


J Phillips posted on Friday, February 16, 2007

I see several examples of code like "project.Resources.Assign(resourceId);"  in the C# BO book sample ProjectTracker.

The "Resources" property getter is returning a reference to the property's private data so the UI code can call the Assign method on it. Iit is not making a copy and returning a copy of the private data.

Doing this breaks encapsulation (by returning a reference to the private instance member).

Any suggestions on how to do this better - avoid violating encapsulation and yet provide the desired behavior?

Thanks.

RockfordLhotka replied on Friday, February 16, 2007

I'm not sure I understand your question.

project.Resources returns a reference to a business object. "project" itself is a reference to a business object as well - the two objects are peers in a sense, because they are both public parts of the business layer.

Were project.Resources to create a copy of the collection, it would (by definition) have also created a copy of all the child (and possibly grandchild) objects - in other words the whole graph. No sane person would write code like that, because the cost of cloning entire object graphs is quite high.

But on top of that, the Assign method would then be called on this cloned object graph. The resulting changes would be made to a copy, not to the real objects. It isn't clear how those changes could be persisted at that point? You'd have two (or actually n) copies of this graph that would have to be reconciled when the user clicks the Save button. The complexity of such a concept is quite high.

Now there is another entire world view on object design. In that world view UI code doesn't interact with CSLA style business objects at all - ever. In that world view there are a set of very high-level objects that represent use cases. You can almost think of these "objects" as a service interface, because they tend to return and consume entity objects (simple data containers). All the actions taken on those entity objects are entirely encapsulated behind this high-level interface.

Personally I find that design approach to have limited value, at least when creating Windows or WPF interfaces. They totally eliminate any hope of interactivity thanks to that thick layer of abstraction.

If you are looking for that style of OO design, then CSLA .NET simply isn't for you. CSLA .NET goes the other way: helping you build a set of objects that reflect the problem domain, in such a way that you should interact directly with those objects. The result is a high level of interactivity, and yet the actual business logic and processing remains (to a very large degree) encapsulated within the set of business objects the comprise the business layer.

J Phillips replied on Friday, February 16, 2007

For years, OO programming "standards" have dictated that references (or pointers) to private mutable data members should not be returned by methods (in this case the Project.Resources property getter is doing just that - returning a reference to a private instance of the  ResourceList class).  I agree it is very cumbersome to avoid this.  All I can think of is that you have to (1) implement a facade on the parent class that replicates the behavior of all the children (at least for the methods that return references to mutable types); or (2) make a copy (using your cloner) on the getter, allowing the caller to modify a copy of the internal data, and then provide a setter (sometimes called a mutator) that reassigns the private reference to the one the caller modified ... but the caller still has a reference to it (yuk).  This is why I am looking for a better way :-)

Maybe this is just a trade off we make for convenience and performance. 

Dozens of references are avaiable that state that this is a violation of encapsulation:

Just do a Google Search on "do not return" private data.
Also in dozens of Java books and "Effective C#" by Bill Wagner.

Thanks.

 

RockfordLhotka replied on Friday, February 16, 2007

CSLA .NET objects are designed to be interacted with. The objects are not private data. All public objects are just that: public. The properties are merely a navigation path to get to those objects.

It is important to distinguish between data that's actually private, and data that exists to provide a navigable path.

UML has a notation for this. The aggregation symbol has an associated (and optional) navigation symbol. A great many aggregate objects also provide navigation to their "children". And that's exactly what's happening here.

A Project contains (aggregates) the ProjectResources collection, which in turn contains ProjectResource objects. But none of these objects are private data. They are all public, and are designed to be public. The properties merely provide a navigation structure through the object graph to make it possible for a consumer to reach each object.

It is equally possible, and reasonably common, for an object to collaborate with other objects that are not navigable. Sometimes those collaborators are public, sometimes private. But if there's no natural navigable relationship then the navigation link shouldn't exist. And of course if the collaborator really is private data then it shouldn't be exposed at all.

But again, that's not the case here. The Project object consists of a set of public objects that exist within the context of a navigable object graph.

J Phillips replied on Friday, February 16, 2007

In your sample solution, there is a Project class.  That class has a private data member called "resources" that "points to" a mutable reference type (not an immutable type or a value type).

<code>
private ProjectResources _resources =
      ProjectResources.NewProjectResources();
<end code>

There is a property ("Resources") who's getter clearly returns a reference to that private member variable.  

<code>
public ProjectResources Resources
    {
      get { return _resources; }
    }
<end code>

This breaks encapsulation. 

Given this, I do not understand your comments above above being public - it is clearly private.  The Property is public, yes, but it is returning a reference to a private member variable.

The way it is coded, you might as well make "_resources" public - which is maybe what you wanted to do so it could be a "navigable object graph".  But the way it is currently implemented gives rise to the question of whether you mean it to be private (as the declaration is coded) or public (as the behavior is coded .. by giving out a reference to the private memeber variable).

Thanks for the responses.

Brian Criswell replied on Friday, February 16, 2007

No, it does not break encapsulation, or, if you prefer, breaking encapsulation is exactly what should be done here.  The purpose of resources is to present a modifiable list of resources to a user interface.  The instance of the ResourceList object is held as a private field for the following reasons (maybe more)
If the list of resources returned a copy, then the only way to modify those resources would be to have a setter that replaced the list.  This then causes a race condition if two execution paths want to modify the list.  A and B get the list A makes a change and updates the list and then B makes a change and updates the list.

All of the actual persistable data is fetched and set using copy value semantics.  ResourceList is a child of Project so that you can see the resources in a project.  It does not return a copy for the reasons previously mentioned.  It does not expose its member field because that would open it up to abuse.

Of course _resource should be private.  It could be set to a different list otherwise.  Giving out a reference to a privately held object is perfectly acceptable in this scenario.  You want the contents of the resources list to be modified directly, otherwise the project will not know about the modifications.

I believe Rocky is referring to resources being public because it is meant to be modified by the world.  However, the internal list is not to be set to another list.  Hence, the one piece of data that does need to be private remains so.

What would you gain from returning a copy of _resources?

J Phillips replied on Friday, February 16, 2007

Your comment from above is inserted below:

<start excerpt>

The instance of the ResourceList object is held as a private field for the following reasons (maybe more)

<end excerpt>

In response to your first bullet point:  Making a member variable private actually prevents navigation to it - it is the public getter of the property that makes it visible outside the class.  That is not a "reason to make it private".  That doesn't make sense.

In response to the second bullet:  making a member variable private does not make it readonly.  It is not marked readonly (or const) anyway. 

I am at a loss as to why no one (yet) seems to acknowledge that when you return a reference to a mutable reference-type object in the .NET framework (as is happening in this code), you are giving the caller a reference to the very same object that you are saying is private in the _resource declaration.  This allows the holder of that reference to invoke all the public methods on the "private (encapsulated)" instance variable.  You have essentially made a private instance variable public. 

Race conditions (that you mention above) is a threading issue.  Race conditions are avoided with proper thread synchronization techniques.  It has nothing to do with the class design and definitely not related to encapsulation or copying.

 

Brian Criswell replied on Friday, February 16, 2007

You cannot have navigation of an object graph without storing a reference to the child objects.  However, we do not want anyone changing the object graph on us (swapping one child object for another), so the field is private.  Access to the child object is then provided through the getter, making it read-only to the outside world.

I think both Rocky and myself fully acknowledge that returning a reference allows anyone to access the public methods, but that is exactly what we want and is how CSLA has been designed to be used.  A business object developer would have to expose state, and not another CSLA object to really have a good chance of exposing themselves to problems of this nature.

The race condition example was merely to highlight a problem with  running around with copies of lists instead of references.  An example of a "slow" race condition would be a UI that made two copies of a child list, A and B.  Button A makes adds an item to list A and puts the copy back in the parent object.  Button B does the same with copy B.  The application does not know which order the buttons are going to be pushed in and this leaves you with a slow race condition where the application will never know whether the A list or the B list will be the resulting list.  Working with references to a single object would leave you with the changes from both button A and button B, i.e. two new items in the list.  This is a contrived example to be sure, but it highlights a problem with returning copies of child objects.

So how would returning a copy of _resources be of benefit?  I think an example would help us understand your viewpoint a bit better.

tetranz replied on Friday, February 16, 2007

J Phillips:

Just do a Google Search on "do not return" private data.

I tried that, and some variations. Everything I found talks about how you should not expose a reference to a mutable private object when it is not your intention to allow properties of that object to be changed from outside the parent object. Most of the examples I found were in Java. It seems that Java has a Date object that has a SetTime() method which changes the time inside the Date object. A common error is to expose a Date object as a public readonly property while forgetting that someone could call SetTime() on the object therefore changing its internal data. That's a classic example of violating the rule you are talking about. (btw, the .NET DateTime struct does not have that problem. All its Add() methods return a new DateTime.)

A common .NET example is exposing an array as a readonly property and forgetting that anyone can change an array element.

The situation with Rocky's CSLA examples is different. We actually want the outside world to be able to modify the collection. That's kind of the whole purpose of exposing it as public. If you don't want the outside world to be able to modify the collection then derive the collection from CSLA's ReadOnlyListBase.

J Phillips:
In response to the second bullet:  making a member variable private does not make it readonly.  It is not marked readonly (or const) anyway.

You seem to be confusing the reference to the collection with the contents of the collection. Brian is exactly right. You don't want someone to be able to replace the collection with a new collection. That would definitely break encapsulation and it is the reason why the public property only has a getter. You would not have that protection if the private variable was exposed.

From your first post

J Phillips:
Any suggestions on how to do this better - avoid violating encapsulation and yet provide the desired behavior?

Please give a real world example of what you are trying to avoid?  I'm no OO expert but, to me, in simple terms, encapsulation means locking up the internals of my objects so nobody can fiddle with the way I choose to implement a method or property. That means I can safely change it later without breaking other code.

Cheers
Ross

jkellywilkerson replied on Friday, February 16, 2007

I was under the impression that encapsulation was to prevent direct access to class member variables.  But, in order to interact with the class, one would need to use a specific set of methods, i.e. property getters and setters.  I don't see how the above use can possibly break encapsulation.

UI developers can interact with the business objects, but only through very specific channels.  Hence, encapsulation is upheld.

Also, the exact details of how the resource list is generated is concealed.  The UI developer only knows to get the list, add to the list, or delete from the list.  Again, encapsulation is upheld.

xal replied on Saturday, February 17, 2007

Resources (or any child collection that's supposed to work that way) is meant to be exposed. You don't want to expose all it's logic by duplicating it in the root object. You want to expose it, so that the ui can interact with it, but in a way were it' can't replace it.
So, the whole reason for having a private field that contains the collection and a public getter to allow visibility is that you can alter that object, but not change / replace it.
It is _NOT_ the same to just have the field be public, and the reason is that you could do this:


mProject._Resources = New ResourcesCollection()

Having a public getter (and not a setter) prevents you from doing that.
Resources are meant to be exposed. They are meant to be used.

Think about a very common case, like SqlCommand. It has a collection of parameters, and you can add or remove parameters, but not replace the collection. Just like the "Controls" collection in a form / control.
You wouldn't want to replicate the logic, properties and methods of the collection in the root object. What you propose is like "flattening" the hierarchy, and that doesn't make much sense, because you end up "loosing" the hierarchy and having something unmanegable and unmaintainable.

Andrés

RockfordLhotka replied on Saturday, February 17, 2007

J Phillips:

The way it is coded, you might as well make "_resources" public - which is maybe what you wanted to do so it could be a "navigable object graph".  But the way it is currently implemented gives rise to the question of whether you mean it to be private (as the declaration is coded) or public (as the behavior is coded .. by giving out a reference to the private memeber variable).

I think you misunderstand how a .NET property works.

Exposing _resources directly would allow a caller to change the field to point to some other object - and that would be bad.

Exposing a reference to the ProjectResources object through a property set is merely an implementation of an accessor method. The caller can not change where _resources points, all they can do is interact with the object provided. That object, of type ProjectResources, is designed to be used by consumers of the business layer. It has a safe public interface that is designed for consumption.

Copyright (c) Marimer LLC