Design Direction Opinions

DeHaynes posted on Wednesday, September 13, 2006

I am designing this system from database to webpage. Subequently I have made decisions to make the database as open ended as possible. For instance I have a Contact Table to track people. There are seperate tables for NamePrefix, NameSuffix. I also have a seperate table for Addresses, EmailAddresses, PhoneNumbers and Relationships with a Many-to-many table for each. This allows me to have as many address, email address, phone numbers and relationships as I want.

The result of this is when I go to code objects in CSLA, I don't have a one-to-one relationship between records and objects. For instance, the tblAddresses table only contains addresses. The tblContactsM2MAddress table contains a foriegn key to point to the contact, a foriegn key to point to the address and a foriegn key that points to another table called tblAddressTypes. So for me a complete address object is actually based off of one record in the tblContactsM2MAddress combined with the associated record out of tblAddresses.

What I want to know is if this is common or am I heading down a road on my own?

skagen00 replied on Wednesday, September 13, 2006

First, your problem is definitely one a lot of us have gone through. In fact, I'm working in the "contact" portion of our system at this time as well.

People can have addresses. Certain people may end up sharing the same address instance or record in the database. But when looking at a person class, what do you have? Addresses (at least for me) don't stand on their own - they're "child" objects for me.

My approach is a little bit more involved but taking the concepts above I would have a person class along with a collection of addresses.

When it comes time for concurrency checking, I manage a modified datetime on each address as well on each person - so if I have two people that happen to share the same address and one modifies it and saves, the other cannot do the same without reretrieving the object. Every one of my objects (parent and child) do a concurrency check (which may not be the typical approach for everyone here).

In my system, a Profile (Individual or Organization) have a collection of Relationships, Names (primary and alternatives), etc. Some of these are many to many and some of these are one to many, but the schema in the database shouldn't dictate your object model - which may say a person has relationships and addresses, etc.

DeHaynes replied on Wednesday, September 13, 2006

I agree. My CSLA implementation is a Contact object with an Addresses Collection object that then contains Address objects.

1. When someone updates an address, my stored procedure checks to see if the old address was used by someone else. If it was, then I cannot change it and I create a new address record. If it was only used once, then I contine on.

2. Next the stored procedure checks if the new address already exists. If it does, then the many-to-many record is made to point at the already existing address. If it doesn't exist then it modifies the existing one or creates a new one depending what was decided in the above step.

3. If the new address already existed and the old address was only used by the current contact, then the old address record is removed from the table.

skagen00 replied on Wednesday, September 13, 2006

It seems to me like your approach might be a little challenging - and I'd like to understand how you're thinking because these are issues I have had to make decisions on as well.

In your cases:

1) So if two people are sharing an address (like husband and wife) aren't you effectively disconnecting the linkage and establishing two unrelated addresses that could be changed independently at this point (when you'd ideally like to keep them in synch?). If this is a concurrency issue where the husband changed the address while the wife had it open and then the wife was saved I'd simply not allow the wife save to be successful and require the user to reload the wife.

2) How are you determining that an address already exists? By comparing two string literals? Unless there is a very rigorous means that enforces entering of addresses in an extremely consistent manner I wonder if trying to compare string literals might not be a little tricky.

3) I'm not quite sure what you mean here.

Do address changes affect everyone using that address? In my system, they do - when looking at an address I give an indicator and a read-only list as to who is using the address - one changes the address with the knowledge that changing the address changes it for all persons using it.

In my system, iIf someone is truly moving, a person should delete that address (their hookup to that address essentially) and create a new address in the system. If everyone deletes their connection to an address, the address in the database should go away.

malloc1024 replied on Wednesday, September 13, 2006

What is the relationship between the contacts that have the same address? Are they part of the same family, business, customer (that can contain multiple contacts) etc? In this case the address does not belong to a specific individual. It belongs to the family, business or customer. This eliminates the need to share addresses or keep things in sync.

skagen00 replied on Wednesday, September 13, 2006

In our system we are shying away with a concept from a full-fledged family record. A family is really a result of relationships between individuals. I don't want to go into great depth over it but there are real reasons why "families" get to be a problem when one starts attributing transactions to families and the families may act sometimes in a husband/wife way or a husband/wife/child/child way or a child/wife way, etc (or families break up, etc).

Any individuals and organizations can share an address, but typically this occurs as a result of a work relationship or a family relationship, and that's pretty much it.

In our system we will be providing means to help intuitively establish address sharing (likely as prompts as certain relationships are added/removed).

DeHaynes replied on Wednesday, September 13, 2006

skagen00:

It seems to me like your approach might be a little challenging - and I'd like to understand how you're thinking because these are issues I have had to make decisions on as well.

In your cases:

1) So if two people are sharing an address (like husband and wife) aren't you effectively disconnecting the linkage and establishing two unrelated addresses that could be changed independently at this point (when you'd ideally like to keep them in synch?). If this is a concurrency issue where the husband changed the address while the wife had it open and then the wife was saved I'd simply not allow the wife save to be successful and require the user to reload the wife.

2) How are you determining that an address already exists? By comparing two string literals? Unless there is a very rigorous means that enforces entering of addresses in an extremely consistent manner I wonder if trying to compare string literals might not be a little tricky.

3) I'm not quite sure what you mean here.

Do address changes affect everyone using that address? In my system, they do - when looking at an address I give an indicator and a read-only list as to who is using the address - one changes the address with the knowledge that changing the address changes it for all persons using it.

In my system, iIf someone is truly moving, a person should delete that address (their hookup to that address essentially) and create a new address in the system. If everyone deletes their connection to an address, the address in the database should go away.

1. Yes, the linkage would be broken temporarily. So if a user (non-technical) wants to change a husband & wife address, they would first go and change the husband address. My stored procedure would see that A) The old address is used by someone else and B) the New Address doesn't already exist. So it would create a new record for the new address.

Then then the (non-technical) user would go to update the wife's address. My stored procedure would see that A) this is the only user using the old address and B) the new address already exists. (because the husband change created it). So it would point the wife at the existing address and delete the old address record since it is no longer needed by anyone. So now the linkage of a single address is complete again.

Part of the reason I did things this way is becuase non-technical uses see these addresses as individual entities. So they will think they have to go change both entries. Yes I could make it so that when you change one record, you change it for everyone but this is supposed to OOP, so in the real world there would be two different index cards. So the user expects to make two changes. So why spend time trying to explain things to them? Instead, I use their energy to help keep orphaned records from happening.

Now the downfall is that they could fat-finger one of the addresses and end up with a different address for each spouse. As soon as they realized it and corrected it, you would be back down to 1 address record for them and anyone else who uses that address.

2. I compare every field in the address. In my address table there is only one field that is not required. It is Address Line 2. In my object I fill that with an empty string if it is going to be NULL. It makes comparing simplier

3. In the example I gave above, when the wife changed addresses, it left an address that was unused by anyone in the tblAddress table. My stored procedure recognizes this and deletes these addresses so as not to have orphaned records. (In the future I plan on figuring out a way to inactivate them without deleting so I can keep history, but right now they just delete.)

I reserved the situation where an address change, changes it for everyone that uses that address, to special situations. My thinking is that some people who have no idea how relational databases work, might try to use it. I would rather code in a special case into a BO and UI or have them come to me rather than to misuse it.

DansDreams replied on Wednesday, September 13, 2006

Yeah that's a pretty common scenario.

It's really helpful in CSLA development to divorce yourself from the thinking of business objects as 1-to-1 representatives of database tables.

I like the thinking behind your additional concurrency functionality on the address, but I'm not sure that would be helpful to the user. If I have two people living at "2342 Walnit Drive", and I understand the concept of two contacts sharing the same address (as I would have to), then I would think if I'm correcting that shared address to "2342 Walnut Drive" I'm changing it for both contacts.

skagen00 replied on Wednesday, September 13, 2006

Because multiple person can utilize a child object (Address), it isn't entirely true that a person fully contains an address and that the person concurrency checking protects the address information from simultaneous change. Adding the concurrency on the child level adds that protection.

Not that it's a huge deal but that's why I do it.

JZuerlein replied on Wednesday, September 13, 2006

I've been down this road too.

Having two people share the same address is not a great idea unless you can provide a way for the user to understand what is at stake when they make the change. The best way I've seen this done, is to have business addresses shared by all the people who work at a business. When the business moves, that change can be applied to all the employees. The key is to have a UI prompt asking the user if the address change should apply to the person, or the business. If the change is for just the person, then create a new address, and delete the reference to the old business address.

The problem is that most users don't understand why the question is being asked. Which in my experience leads to some sort of administrator monitoring and "undo" functionality when the user makes the wrong choice.

DeHaynes replied on Wednesday, September 13, 2006

Wow this is a popular question. I like it. :)

The reason I started this thread was because during the design of my database, I started running into resistance. You know the "I have been doing system design for 20 years and I have never seen anything that required THAT level of complexity!" type of thing. For the most part I have held it off because I can stand up and explain myself. But it is good to get objective opinions. In that vein, I will expand on how I implemented my Address BO (child) and see what you all think. My basic tables are.

tblAddresses             This contains the addresses
AddressPK                 int, identity
AddressLine1             varchar(50)
AddressLine2             varchar(50), nullable
City                            varchar(50)
StateFK                     char(2)                <------- Federal 2 letter state abbreviation
Zip                             varchar(11)
CountryFK                char(3)                <------- International ISO 3 letter country abbreviation
LastChange                timestamp

tblContacts              This contains my contact info
ContactPK                int, identity
NameTitleFK            int
...etc
LastChange                timestamp

tblAddressTypes     This stores the various types of addresses (Business, Home, Vacation, etc)
AddressTypePK        int, identity
AddressType             varchar(150)
LastChange                timestamp

tblContactsM2MAddresses      This is links my my contacts to my addresses and allows May-to-Many
ContactM2MAddressPK              int, identity
ContactFK                                    int
AddressTypeFK                           int
AddrssFK                                    int
LastChange                                  timestamp

When looking at it as an object, I transfer the fields like this.

Object       =    ContactAddress
Properties  =    Id     <- ContactM2MAddressPK from tblContactsM2MAddresses
                        AddressType       <- from tblAddressTypes
                        AddressLine1      <- from tblAddresses
                        AddressLine2      <- from tblAddresses
                        City                     <- from tblAddresses
                        State                   <- from tblAddresses
                        Zip                      <- from tblAddresses
                        Country               <- from tblAddresses
                        _M2MLastChanged       < - from tblContactsM2MAddresses
                        _AddressLastChanged   < - from tblAddresses

It is my thought that the UNIQUE combination of the AddressType with the Address makes a complete address. The address record inside the address table is useless alone so the index off that address record in tblAddresses is not very useful. So I use the Primary Key field in tblContactsM2MAddresses as my Id for my address objects.

I use the same M2M layout for my Email Addresses and Phone Numbers. I plan on using a very similiar layout for Relationships.

The reason I chose this layout is because my company is notorious for coming to the IT department and saying, "Yea, we know we told you we only needed to do this, but now we gotta use your system to do this additional thing." So I cannot use a solution like they do in Ms Exchange 2003, which we use for e-mail. It only stores 3 addresses and 8 phone numbers in a contact. With this setup, I can store as many addresses of any address type they want to create.

SonOfPirate replied on Tuesday, September 19, 2006

I agree that this is a great discussion! With that in mind, here's my two cents with regards to the concept of linking and breaking addresses.

First, I share the same design with many-to-many relationships between addresses, phone numbers, etc. but that is mostly inherited from my previous work. My thoughts on this have changed as I have evolved to SOA and a behavior-based approach rather than busting my hump trying to optimize and normalize the database schema. Afterall, it is normalization that leads to this design. The goal being to eliminate redundancy from the address table as much as possible. And when you have multiple people sharing the same address, whether it is their business or home address, normalization rules dictate that you should establish a many-to-many relationship.

BUT...

When you start looking at this from a behavioral perspective and, more importantly, a real world point-of-view, things don't always agree with normalization standards. That is why you find most production databases aren't normalized to the extreme.

Fact of the matter is that an address and its owner do have a relationship. Where most developers get off-track, I believe, is in their interpretation that an address is just some attribute of the parent entity. This is not the case. An address is an entity in and of itself. It represents a physical location and has certain behaviors and attributes of its own and depending on your application, you may or may not tap into these but they do exist in the real-world model. These attributes, say the tax rate for instance, are independant (for the most part) of the parent entity. The same applies to phone numbers.

Instead, in the real-world, we establish relationships between people and these entities. When someone purchases a house, they establish their relationship as the new owner of that address. When the are assigned a phone number from the phone company, they establish their relationship with that number. When they move, does the address cease existance? No. When they change phone numbers, does the original disappear? No.

So, even though this may sound out in left field or like it makes for a complicated design, it really doesn't and all of the above discussion is pretty much on-track. The key is the symantics and getting everyone on the same page conceptually.

Following along with a couple of the use cases described previously, when a husband and a wife share a residence, both entities have a relationship to the same record in the Addresses table. When they move (change addresses), they don't actually change the address. Instead, they change their relationship from the previous address to the new one. The original address is still there. Presumably another person or family has established their own relationship with that address. Otherwise, is sits there unassigned (unoccupied).

Should the couple's new address not be in the system, then a new record is created for the new location and the relationship established. This allows for urban-sprawl.

Should the couple get divorced, (let's assume) only one of their addresses changes. You must break the pseudo-relationship between the "people" by establishing a new relationship between the spouse moving out and their new address. The old address remains intact and still has a relationship with the other spouse.

This change in direction was helped by a recent effort into a solution for a medical office. The system tracks each patient as a separate entity, each with their own billing address, phone numbers, insurance information, etc. But, a huge short-coming in their previous system was the inability to link information causing their office staff to re-enter information for each member of a family when changes were required. To accomodate this, we had to establish relationships between the family members - not the addresses! Fact it, nothing besides their particular business rules, or more appropriately, the insurance carrier's rules, dictated that each member of the family covered by the same policy had to reside at the same address - so the system didn't either. If the insurance company said they did, then that was their problem. For us, we didn't care if all or none of the family members shared a residence. But, if they did, a change of address could easily be carried over to the other family members through their relationship. And, yes, it required prompting to make sure that the change was for the whole family and not because lttle Bobby went off to college. In addition, because we allowed linkage between family members regardless of address, phone number and, yes, even insurance, our application became a great tool for the doctor's to track family history - a nice side affect that has since been exploited further.

The point is that the relationships you are talking about with regards to a group of people's business address, etc. are based on relationships between the people, not the addresses. If you want to be able to update all employee's addresses when their employer moves, you do so by relating the employee with the employer and changing the employer's address. The employee-employer relationship establishes your group and the change is carried through automatically.

In addition, this simplifies applications such as eCommerce packages where you need to have an accurate snapshot of the order information at the time the order was placed. Any of us who have developed one of these applications knows that you can't simply relate the customer to the order, you have to record the billing and shipping addresses at the time the order was placed. This is typically done by copying the information into text fields in the orders table because you can't trust that the original information in the Addresses table hasn't changed since the order was placed. However, by persisting the addresses as mentioned above, the relationship between the order and the address used at the time the order was placed won't change even though the customer's relationship to that address may have.

However, there is one short-coming to this approach which is why I still copy the address information into the Orders table (until I find a permanent fix for this). While you don't allow direct changes to the addresses as part of moves, you will still need to handle address corrections. For instance, were I am located we have had a number of zip code changes over the last 20 years as the population has expanded. The same applies to phone numbers where area codes change. As a result, there is a need to edit the address information. Having safe guards to ensure that the change in only a correction and not a full-fledged change is an issue. So, for instance, you want to allow the zip code to change from 12345 to 12354. You need a way to distinguish a correction from a change/relocation.

The way we've handled that is to allow only changes to specific fields and in a controlled manner. The postal code can be edited but only by selecting a value from a drop-down list of values held in the database relating the postal code to city and state. So as long as the address is for Springfield, only those postal codes are available. If a new postal code has been assigned to the area, than it needs to be added to the master list. Cities, States and Countries are all contained in look-up lists with relationships established. If Mississippi was misspelled, then the master entry is what gets changes, not the individual address (and only by an administrator). We even separate address numbers from street names for the same reason as the house number should not change unless the person or business has moved.

This model has proven to be a nice mix of the normalized approach and behavioral/real-world approach. While not fool-proof when it comes to corrections, it does model the true behavior that takes place in the real world where it is the relationship between the person, couple, family or business and the address that changes, not the address itself.

Hope that helps.

JHurrell replied on Tuesday, September 19, 2006

Arrrr... Pirate. You know what today is don't you? http://www.talklikeapirate.com/

I like your approach and think it's well thought out, but wanted to get a little more information. You indicated that Cities, States and Countries are all contained in lookup lists. How do you contend with cases when a user lives in a city that does not exist in your system? Say for example there's a small town somewhere, and a user from that town is the first one to use your system. How do they provide their city?

Do you include some kind of "Not In List" item that would then allow the user to enter the city into a textbox?

This pattern is one that I've used in the past and it works well for systems where you do not expect many custom entries. There is a flag on the record that indicates whether or not the custom entry has been reviewed. Reviewing allows us to check the entry for correctness and make sure that it actually IS new.

- John

MrPogo replied on Tuesday, September 19, 2006

It really would be neat to see this design layout. It really sounds like a well though out approach to handle the most common scheme that almost everyone uses.

skagen00 replied on Tuesday, September 19, 2006

Excellent post SonOfPirate - your approach is very similar to ours but different in some aspects and gives us good stuff to chew on. Thanks for taking the time.

SonOfPirate replied on Tuesday, September 19, 2006

The "Not in list" case is something that has to be dealt with on an app by app basis and is dependant upon the interface and user-base. And, I've seen it handled as a wide-open field that allows users to freely enter any value and add it to the database if not already existing (which almost always leads to problems, but some people still insist on this way), allowing users to display the list in a popup window where they can add new items to the database or limiting access to only administrators so that users have to put that item to the side and request the item be added (believe it or not!).

If you are dealing with an internal application, either Win-based or on an intranet, you can usually rely on the users to do the job right or using role-based security to limit access to the master list. Public websites are the most troublesome because it is almost impossible to include every city that will possibly be used during the lifetime of the site. So, some measures need to be taken. Typically, in these cases, we allow the city to be entered manually and don't use a lookup list in that case unless we want to incorporate some kind of data management feature to reset references when invalid entries are found (which we've had to do more than once).

Where we use a greater number of master lists is in environments where the list changes infrequently or can be managed adequately by the customer and the user-base has a hard time entering a name the same way twice. We had a project where we had to migrate a customer's existing database over and found that less than 10% of the data was valid because it was not uncommon for the same city to be spelled as many as 3 or 4 different ways in the database - double that if the city has more than one word in its name!

So, I guess I should have preficed my comments with the fact that my example was an extreme case of fool-proofing and the average app probably won't go to that extreme. Although, I can tell you that we have been looking into address verification software and web services as well as various sources for address information (such as what is plugged into GPS systems) that we can use to build a "master" list that will reduce the likelihood that an address doesn't already exist. But, we are early in that investigation and have yet to decide if the benefits outweight the negatives such as performance hits, etc.

Hope that helps.