DB design question

ajj3085 posted on Wednesday, November 01, 2006

Hi,

Trying to figure this one out... but I'm stuck.

I have a document which has sold to, ship to and bill to information. Sold to is tied to a record in person. Person has a record in Address as well. Ship to and bill to may be tied either to a person or a department.

What I need to do is this. The document should keep information as it is when it was entered. If the document is not locked, it should be possible to refresh the sold to, bill to, and ship to. I'm trying to figure out how I can check to see if the data has changed so that I can display a warning icon on the form.

The problem i'm running into is this; because the data is split across a few tables, using a timestamp seems to be tricky (still in the thinking stage here..). I'm also not sure if I should store the time in point data in some of the same tables; that is, Address could hold the address for the time in point data.

Here's the table structure I have so far...

Person
--------
Name, email, etc.
AddressId
personid

Address
---------
AddressId
Address data

Phone
-------
PhoneId
Phone data

PersonPhone
--------------
PersonId
PhoneId
IsPrimary (a person can have only one primary number per PhoneType)
PhoneType

Document
-----------
DocumentId
Document stuff

DocumentContact (this table is very much in flux..)
-------------------
DocumentId
DocumentContactType (one of three values idenfitying shipto, soldto, billto)
AddressId
PhoneId
FaxId (FK to Phone.PhoneId)
CompanyName
ContactData (Fname, email, etc.)
PersonId (this is the contact the data was originally copied from, nullable)
DepartmentId( this is the department the data was orginally copied from, nullable)

Either personid or departmentid will be required (via the BO).

My main problem is this; person has an address, and so does document contact. The address for each will be a seperate row, so their timestamps will always differ. How would I be able to tell if the person's address doesn't match the documentcontact's address (ideally I wouldn't be comparing each field..). Same goes for phone data.

Any ideas? It seems like I would need to store in documentcontact a timestamp column which was the timestamp value for each piece of data (person, address, phone, fax, department).. but that seems kinda icky too... but it might be the best way.

Thanks
Andy

Bayu replied on Wednesday, November 01, 2006

Hey,

It took me some to get into your case. I hope I understand it correctly now.

It seems like your challenge is this:
- at the moment a document is created, it will refer to several addresses (via sold to/ship to)
- these addresses may change
- under normal circumstances these would have direct repercussions on your document as you have all your data neatly normalized ...
- and this is the challenge: you want the user of your app to consciously acknowledge a change in address on the document-level, am I right?

For starters, I think you should copy the address info into your document table at the time you create a new document. By copying the data you denormalize your relation model, I am aware of that, but then you ensure that your document has a valid address regardless of what happens to the address table. Then using a 'version' column or 'last-modified-date' column you can keep track of which addresses where changed since they were originally copied into your document table. It might be elegant to make this part of your 'update-address' stored procedure, so it sets a boolean flag on your document record when the address is modified.

I am not sure if I am actually addressing your question, perhaps I'm thinking all the wrong way.

Bayu

ajj3085 replied on Wednesday, November 01, 2006

Bayu,

You almost have it.. the address data is once removed..

The contact is not just address data, its actually a person or department. The address is attached to said person or department. So not only do I have to update address data, but also person or department data.

So if the contact's name changes, I need to alert the user to that as well. Otherwise you are correct. I don't think I need to denormalize, because I can create a unique Address record for the document contact. Here's how the data would look so far..

DocumentContact
-------------------
19 (DocumentId)
23 (PersonId)
15 (AddressId)
0x3435256 (AddressVersion)
0x3455661 (PersonVersion)

Person
--------
23 (PersonId)
"Bob" (FName)
"Dole" (LName)
0x234455 (Version)
65 (AddressId)

Address
---------
65 (AddressId)
123 Fake Street (AddressLine1)
0x455656F (Version)

15 (AddressId)
124 Real Street (AddressLine1)
0x3256235656D (Version)

So the challenge I need to detect
1. If the 'person' data has changed (or department, if the DocumentContact is tied to a department)
2. If the "person's address" data has changed (meaning I'll need to go through person to get the Version for the person's address, and compare it to AddressVersion on DocumentContact).

It sounds like you suggested what I was planning to do... except I have a few more pieces of data to keep track of, and thus a few more version numbers to keep track of.

Andy

david.wendelken replied on Wednesday, November 01, 2006

A few modeling questions for you:

You have Address in its own table. Why? Do you really want to track addresses in their own right? Or just as attributes of something else? (Example: Utility companies track addresses their equipment is attached to whether they have a customer there or not. Most other businesses do not track addresses unless they belong to a customer.)
If you choose to track Addresses in their own right (i.e., not as part of a customer or document), then you need to prevent duplicate records for the same address. This is a non-trivial exercise, I can assure you. 16 Oak Street and 16 Oak St. and 16 Oak St (assuming everything else agrees) are the same address. 16 Oak Street #1(in address line one) and 16 Oak Street (addr line one) and #1 (address line two) are also the same place. Have a swell time. There are vendors who sell packages just to deal with this one issue!
Your model allows a person to have only one address. In many business cases this is not true.
I routinely buy things that get shipped to an address that is not mine, particularly at Christmastime. My address is often irrelevant for shipping address purposes.
Let's suppose you go with the Address table with an Address Id and you successfully avoid duplicates. I order something from you from my address at 16 Oak Street, MyTown, USA. You deliver it and I like it. I move to a new address. (And of course I don't contact you to tell you.) The new resident of my old house calls you and orders something to be shipped to their house. It gets back-ordered for a bit. While it's on back order, I call you up and order something new. Your customer service rep says, "David, do you still live at 16 Oak Street, MyTown, USA?" I give them my new address and - here's where the fun might start - they change "MY" address record. My order shows up at my new address and, when the other person's item is no longer back-ordered, so does theirs. :)

Now, of course, your system doesn't have to allow changes to an existing Address record. It could have just detached my Personal record from the old Address record and created a new one. But then you end up with orphaned address records cluttering up the table. Or some complicated rules for allowing/disallowing changes/deletions to address records

Personally, unless I have a compelling reason to do otherwise, I would have a "CustomerAddress" table with a customer id in it. (I used Customer instead of Person because sometimes companies buy things instead of individuals.) For the document, if I was only ever going to have simple Billing and ShipTo addresses, I would just put the fields in the Document table. I might create a DocumentAddress table if I needed something fancier in the way of document address handling.

ajj3085 replied on Wednesday, November 01, 2006

david.wendelken:
You have Address in its own table. Why? Do you really want to track addresses in their own right? Or just as attributes of something else? (Example: Utility companies track addresses their equipment is attached to whether they have a customer there or not. Most other businesses do not track addresses unless they belong to a customer.)

Just following normalization rules. Does it make sense to have the same fields in multiple tables which have the same schema? It also makes it very easy for me to give any entity in the system an address, either directly or indirectly through a relationship table. As it stands now, people have addresses, departments can have multiple addresses, and now the document ship to, sold to and bill to each have a single address record each.

david.wendelken:
Your model allows a person to have only one address. In many business cases this is not true.

At the moment this is the case. A contact has a single address. Originally they did not; the address belonged to the department, and a contact belonged to a department, and that's how the address was stored. However, our sales people have told me its not uncommon for a person to have their own unique address (eg. a post office box at a university which is there's and theirs alone).

david.wendelken:
I routinely buy things that get shipped to an address that is not mine, particularly at Christmastime. My address is often irrelevant for shipping address purposes.

This is why each Document can have three distinct DocumentContact records. One for a billing address, one for a shipping address, and one for the sold to address. Additionally, the DocumentContact stores other information as it was at the time the document was locked (meaning it cannot be changed). So while a contacts address, department, email, etc. can change over time, at some point the document will not change with it. (Rather, the user no longer has the ability to refresh the document with the new data.)

david.wendelken:
Let's suppose you go with the Address table with an Address Id and you successfully avoid duplicates. I order something from you from my address at 16 Oak Street, MyTown, USA. You deliver it and I like it. I move to a new address. (And of course I don't contact you to tell you.) The new resident of my old house calls you and orders something to be shipped to their house. It gets back-ordered for a bit. While it's on back order, I call you up and order something new. Your customer service rep says, "David, do you still live at 16 Oak Street, MyTown, USA?" I give them my new address and - here's where the fun might start - they change "MY" address record. My order shows up at my new address and, when the other person's item is no longer back-ordered, so does theirs. :)

I don't care about duplicates in the address table. As I said, its purpose is to keep the same schema for address data no matter what kind of entity has address data. It also prevents 10 differently named tables which more or less identical schema.

FWIW, if you really wanted to normalize address data, you'd have Address line 1 number, Address Line 1 Street Id, Address Line 1 Road Type id, etc. I haven't really seen anyone do this to date. Big Smile [:D]

david.wendelken:
Now, of course, your system doesn't have to allow changes to an existing Address record. It could have just detached my Personal record from the old Address record and created a new one. But then you end up with orphaned address records cluttering up the table. Or some complicated rules for allowing/disallowing changes/deletions to address records.

Ya, I don't really want orphans. Its fine if the data changes. I'm just trying to set things up so that if a document still may be changed by a user, there is an option to 'refresh' the data from a source.

david.wendelken:
Personally, unless I have a compelling reason to do otherwise, I would have a "CustomerAddress" table with a customer id in it. (I used Customer instead of Person because sometimes companies buy things instead of individuals.) For the document, if I was only ever going to have simple Billing and ShipTo addresses, I would just put the fields in the Document table. I might create a DocumentAddress table if I needed something fancier in the way of document address handling.

Then I get into having a DepartmentAddress table, a CompanyAddress table, and DocumentAddress table... all with basically the same schema. In our world, a Person is always the one that buys something, even if they are buying it for their company. Our sales people always talk to a person, after all. Smile [:)]

I can still get company level purchase data (well, I can once I have this quoting / invoicing stuff done) because each person belongs to a department, which belongs to a company.

Hope that gives a better view into the business world I'm working within here.

Andy

david.wendelken replied on Wednesday, November 01, 2006

ajj3085:
david.wendelken:
You have Address in its own table. Why? Do you really want to track addresses in their own right? Or just as attributes of something else? (Example: Utility companies track addresses their equipment is attached to whether they have a customer there or not. Most other businesses do not track addresses unless they belong to a customer.)

Just following normalization rules.

Maybe not. :)

A person's address is actually a "current address", or more accurately, a "last known current address".

A document's address (once the document is "finalized") is an historical fact.

Mixing "current" and "historical" information into the same entity is not a requirement of normalization and can lead to all sorts of maintenance problems and unwanted reporting side effects.

ajj3085:

Does it make sense to have the same fields in multiple tables which have the same schema? It also makes it very easy for me to give any entity in the system an address, either directly or indirectly through a relationship table. As it stands now, people have addresses, departments can have multiple addresses, and now the document ship to, sold to and bill to each have a single address record each.

Rocky would tell us when doing object design to normalize the behavior and not to worry about redundant code like lists of properties.

I don't worry about redundant lists of attributes if the fundamental life cycle of the entities (current vs. historical data) is radically different.

That's not to say what you are doing is wrong! :) Just that I question it until I learn more. :)

ajj3085:

FWIW, if you really wanted to normalize address data, you'd have Address line 1 number, Address Line 1 Street Id, Address Line 1 Road Type id, etc. I haven't really seen anyone do this to date.

I have, and it was an awful mess.

ajj3085:

david.wendelken:
Personally, unless I have a compelling reason to do otherwise, I would have a "CustomerAddress" table with a customer id in it. (I used Customer instead of Person because sometimes companies buy things instead of individuals.) For the document, if I was only ever going to have simple Billing and ShipTo addresses, I would just put the fields in the Document table. I might create a DocumentAddress table if I needed something fancier in the way of document address handling.

Then I get into having a DepartmentAddress table, a CompanyAddress table, and DocumentAddress table... all with basically the same schema. In our world, a Person is always the one that buys something, even if they are buying it for their company. Our sales people always talk to a person, after all.

So, you have removed duplicate table schemas but now require much more complex code in order to make sure that someone modifying a person or department address doesn't also modify a document address that is frozen - unless you are creating a brand new copy of the address record when you assign it from a person to a document. Is that what you intend to do?

ajj3085 replied on Thursday, November 02, 2006

david.wendelken:
Maybe not. :)
A person's address is actually a "current address", or more accurately, a "last known current address".

A document's address (once the document is "finalized") is an historical fact.

Mixing "current" and "historical" information into the same entity is not a requirement of normalization and can lead to all sorts of maintenance problems and unwanted reporting side effects.

Technically once a document is finialized, its just there for historical reasons as well. By that logic, I should move it to another table. Of course the users do quite often look at older invoices, and they want them in the same list as the current invoices. So to do that, I need a union query, which would be used pretty frequently.

You can also argue that the documents address is the 'last known current address for the document.

david.wendelken:
Rocky would tell us when doing object design to normalize the behavior and not to worry about redundant code like lists of properties.

I don't worry about redundant lists of attributes if the fundamental life cycle of the entities (current vs. historical data) is radically different.

That's not to say what you are doing is wrong! :) Just that I question it until I learn more. :)

But normalizing data is about keeping data in one place and one place only. Most places I've been have had this similar design. The address is just there and its maintaned by whatever entity to which it belongs, like document or contact or department.

david.wendelken:
So, you have removed duplicate table schemas but now require much more complex code in order to make sure that someone modifying a person or department address doesn't also modify a document address that is frozen - unless you are creating a brand new copy of the address record when you assign it from a person to a document. Is that what you intend to do?

Um, not sure I follow. The documentcontact gets its own address record, it doesn't link to an existing address record. The data for that record can be copied (by selected a contact or department), but no address record is ever shared by entities.

The address id on DocumentContact is its own, no one else ever messes with it. However, I need a way to look at the address record from which the data was originally copied. So to do that, I need a pointer back to the original address. This pointer is gotten indirectly, through Person or Department (although thinking about it, a Department can have multiple addresses... so maybe I need to record it directly).

So, two address records. The way I was thinking is that Address has a timestamp column. In documentContact, I record the value in that column at the time I copy the data from the original address row to the document's address row.

Hopefully that clears things up.. the thing I'm not sure about though is not only do I need to track the timestamp from the copied address record, I need to do so for the person or department as well as two phone numbers. So now I have 4 columns which store the timestamp at the time the data is copied into the document contact record. Unless someone has another way.

Just so we can leave the dicussion about having more than one address table behind... even if I were to do that, I still have these same issues about comparing the data. I still need to know if the data from ContactAddress has changed from when DocumentContact originally copied that address into its own address record.

Thanks for the comments so far!
Andy

jlazanowski replied on Thursday, November 02, 2006

Andy,

Not sure that this is what you're looking for but....

We have similar requirement for production data, in a system that we use. Once a "ticket" is locked, it should retain contact information based on the state of the contact at a given point in time. We also had a another requirement that we should be able to update a contact's information to change at a specific point in the future, our producers change names quite a bit.

So we came up with an effective date scenario. When a user enters new contact information, we include an effective date of change (defaulting to the current date). Effective date in our case is a char field that would look like 200604 (in our case we don't care about the day of change). So when a ticket is created we know the month and year, in this way we can pull the "current" producer record and reference it in the ticket. Users aren't allowed to go back and modify addresses, only create ones with a greater effective date that the current effective date. In this way, if the "ticket" isn't locked and I pull it back up, I can easily run a query to see if a new contact record/effDate combo exists, and flag the user to update.

It's a little different since I'm not talking about address information, however I think the general idea could still apply. You would obviously have to tweak such a solution to fit your specific case.

A side note about your current idea and I qualify this statement with this may not fit your case, but, if your selling something, and you have 10,000 customers, and each of them generate 5 or 6 invoices per month, you'll see your address table get very large, very quickly if you create a new address copy for every sales document.

For what it's worth.

Justin

ajj3085 replied on Thursday, November 02, 2006

Sounds similar to the solutions suggested so far. Your effective date is similar to the timestamp or bit. Your ticket sounds similar to our documents too as far as changes go.

I know the address table can get very large, and I expect that. Fortunately we have LOTS of disk space and only my database on the production server. Smile [:)]

It gets even a little worse than you suggest as well; a quote, once locked, cannot be changed ever again. But there are times an edit is required (because of a typo or billing address was entered wrong). So what we will do is create a revision, which is really a whole new document, just with a pointer back to the obsoleted document, the same document number, and an incremented revision number. The system seems them as different documents, except that it will know its a revision and can display all previous revisions.

So I expect the document table to grow quickly too, although not nearly at the same pace as the address or phone tables.

jlazanowski replied on Thursday, November 02, 2006

ajj3085:
Sounds similar to the solutions suggested so far. Your effective date is similar to the timestamp or bit. Your ticket sounds similar to our documents too as far as changes go.

Well sort of, the difference is that I don't create a copy of the address for every "ticket" we'll use the same pointer (ContactID/EffDate) until it's changed. In this way we don't have all of these redundant contacts saved the in the database, for a sample month I just ran a count query and returned 10,370 tickets. That's a lot of redundant data, and it may very well be necessary for your project Wink [;)]

ajj3085:

It gets even a little worse than you suggest as well; a quote, once locked, cannot be changed ever again. But there are times an edit is required (because of a typo or billing address was entered wrong). So what we will do is create a revision, which is really a whole new document, just with a pointer back to the obsoleted document, the same document number, and an incremented revision number. The system seems them as different documents, except that it will know its a revision and can display all previous revisions.

So I expect the document table to grow quickly too, although not nearly at the same pace as the address or phone tables.

Actually, we do the same thing with our tickets, because of regulation, we have to be able to show our ticket revisions, so yes I know what you mean about the tables growing quickly. We have LOTS of disk space too. Cool [H]

ajj3085 replied on Thursday, November 02, 2006

jlazanowski:
Well sort of, the difference is that I don't create a copy of the address for every "ticket" we'll use the same pointer (ContactID/EffDate) until it's changed. In this way we don't have all of these redundant contacts saved the in the database, for a sample month I just ran a count query and returned 10,370 tickets. That's a lot of redundant data, and it may very well be necessary for your project

Ahh, so your ticket addresses are shared. Given that you can't change your ticket, does the pointer ever change? I'm not clear on that..

One side note is that at the moment there's no requirement for the user to base the ship to / bill to information off of a department or contact.. they can enter 'freelance' data as well. Hopefully they will see the usefulness in linking though because it means they will have less problems with outdated information.

jlazanowski:
Actually, we do the same thing with our tickets, because of regulation, we have to be able to show our ticket revisions, so yes I know what you mean about the tables growing quickly. We have LOTS of disk space too.

Ours isn't so much of regulations, its making sure that they know the information was sent to the customer, and thus are changing the quote as it was sent to the customer. They can't create an order until the document is locked, and they can't create an order on a quote that was obsoleted.

Thanks for the feedback.
Andy

jlazanowski replied on Thursday, November 02, 2006

ajj3085:

Ahh, so your ticket addresses are shared. Given that you can't change your ticket, does the pointer ever change? I'm not clear on that..

I think my ticket explanation was clear as mud. Our tickets, can and do change, but once their "locked" they no longer change. So for an example, we'll say that Producer Bob changes his name to Bob, LLC. and now has a different tax number. So we pull up a ticket that had producer Bob on it, that has the old contact information on it. When the ticket is pulled up a check is performed to see if updated contact information is available. If it is, it flags the user and the user decides if the new contact information is supposed to apply, and may update the "pointer" if applicable. Once the ticket is locked (this happens on a certain date every month) then no more updates may occur. The ticket is considered finalized.

ajj3085:

One side note is that at the moment there's no requirement for the user to base the ship to / bill to information off of a department or contact.. they can enter 'freelance' data as well. Hopefully they will see the usefulness in linking though because it means they will have less problems with outdated information.

If this is the case that ends up being the rule, then there really isn't away around creating an address record for every single document, you would have to assume that they could "freelance" the contact information once its been pulled into the document object.

Bayu replied on Thursday, November 02, 2006

david.wendelken:
So, you have removed duplicate table schemas but now require much more complex code in order to make sure that someone modifying a person or department address doesn't also modify a document address that is frozen - unless you are creating a brand new copy of the address record when you assign it from a person to a document. Is that what you intend to do?

Hey david,

It sounds like you have some experience here, certainly more than me. However, in all respect, I think I disagree with several of your points:

- you seem to assume that a more complex DB model will naturally result in more complex BOs. This is an invalid assumption I fear. Aren't you perhaps overlooking your stored procs/sql? They mediate between your DB and BO. However normalized your DB: there is nothing SQL can't handle. Factoring out addresses in a separate table means an extra join, update or where clause in your SQL, true, but you BO does not need know ...!?! SQL can hide most (if not all) of the complexities that come with normalization from your BO.

- you also mentioned something like an 'address table in its own right':
I agree with Andy that this is just good old normalization. What do you imply with 'in its own right'? surely Address is not to become an editable root object ... I guess that this boils down to the previous bullet: nothing SQL can't handle. Your BOs will naturally be highly de-normalized, they have to in order to be able to serve their purpose (you would probably put address stuff in a base-class to promote reuse, but that's another discussion).

- to resolve Andy's issue with changes in associated tables that may or may not have to be copied to the Document I don't think the answer lies in a particular DB model. I believe that any reasonable model could be made to work (a dangerous claim, I know, but none-the-less ....). It think the issue merely concentrates on where we should resolve the issue ....

I think that the stored procs are key to a neat resolution. In your update-address stored proc (and in similar vein also update-contact and all other update-procs that deal with associates of the documents) you could put some logic that updates all associated documents (if any) and sets a boolean (bit) column to true. This would mark the document as having a 'stale' reference, which the user might wish to refresh.

This way you basically implement it as a 'push' model. Your associated objects (address/contact/department) 'push' warnings to your document. The opposite is a 'pull' model where you execute a massive comparison query that tries to analyze which documents may need a refresh based on date comparisons.

I think the push model is more elegant and surely it will perform better.

Bayu

ajj3085 replied on Thursday, November 02, 2006

Bayu:
I think that the stored procs are key to a neat resolution. In your update-address stored proc (and in similar vein also update-contact and all other update-procs that deal with associates of the documents) you could put some logic that updates all associated documents (if any) and sets a boolean (bit) column to true. This would mark the document as having a 'stale' reference, which the user might wish to refresh.

This way you basically implement it as a 'push' model. Your associated objects (address/contact/department) 'push' warnings to your document. The opposite is a 'pull' model where you execute a massive comparison query that tries to analyze which documents may need a refresh based on date comparisons.

I think the push model is more elegant and surely it will perform better.

This is a good idea, and is similar to what I had come up with. Instead of a bit though, I had planned to store the Version from the address record at the time it was copied. A view could have a column indicating if the SourceAddressVersion in DocumentContact does not match the Version from the Source Address's record. The advantage here is that the procs don't need to know about all the other tables that are tracking changes (right now, only DocumentContact cares if the source address record is different from its own address record), so I can keep them fairly 'stupid' still... which is one of my goals for the db design. Keep it as 'stupid' as possible, only giving it smarts if there's a big performance benefit, for example.

Given that your solution is very similar to the one I was thinking, I feel more comfortable.. although I hope others chime in as well, because that would give me more confidence before I decide to move forward with a particular implementation.

Andy