Using Surrogate Key in CSLA and in General !

tarekahf posted on Tuesday, May 20, 2008

I have noticed that in CSLA Sample Applications, and in several other places in other forums, they depend on the concept of Surrogate Key (Internally generated Dummy Primary Key).... as a mater of fact, I am a bit unhappy about this.

As per what I remember since I graduated form university more than 14 years ago, that the design of Table Primary Key/Foreign should match that in the real world.

If the Staff Table will contain the data about the Staff and each Staff has an ID Assigned to him by HR Dept., following a Business Logic of some kind, then why would I bother and create an Internal Dummy Key ? Why not use this same Staff ID as the Primary Key of the Staff Table ?

Also, the Staff Family Table must have a Foreign Key (Staff ID) and a Seq. Number for the Family Member to identify the Family Member Record... and so on.

And the creation of the Staff Record and His Family Member Records must follow the same steps applied in the real world and it must be implemented in the same way in the Staff Business Object, which is a perfect match with the objectives of OO Design and Programing.

At the end of the day, you have to make sure the Key Value of the Real Word is not duplicated, so why create 2 fields to have the same effect, ie, Unique Dummy Key and Unique Staff ID Key ??? Isn't this unnecessary overhead for the work and Database ?

I need to know if in CSLA, is it a must to use Surrogate Key ?

What is wrong is I depend on the Key Values used in the Real Word, and control the Generation of the Key Values in the Program instead of Using Identity Key or GUID ?

Thank you for your feedback.

Tarek.

Lalit replied on Tuesday, May 20, 2008

I agree with your point straightaway while talking about OOP concept. But in some situations when such identities do not exists in real world or may be for any other reason if user is bound to generate identity in DB automatically, you can not identify object specifically without having such dummy keys prior to saving your object in database.

Think of a situation where you are having a collection of editable child objects. Now while editing that collection you do not save it each time you edit its child. If you adds a new child to the collection and later decides to edit it further prior to saving the whole collection, you do not have any other mean to seperate it out from your collection if identity of this object is autogenerated in DB. By adding a GUID to your object, you are assigning it a temporary unique key that will exist during the life of your object.

Apart from that its not necessary to use such dummy keys in CSLA. It will work just fine without this if you are not going to face above said issue.

tmg4340 replied on Tuesday, May 20, 2008

There's nothing wrong with using natural keys instead of surrogate ones. CSLA puts no requirements on the way you design your database. If you want to do that, go right ahead.

As for the debate of natural vs. surrogate keys... well, folks can have some strong feelings on that one. The basic problem with natural keys is that they tend to be values that are entered by the user. Thus, they need to be editable, and it's not always A Good Thing to have your table's primary key editable. Once you have child records established, it can become a relatively expensive operation to cascade your key changes - even if the database does it for you. Plus, with an editable primary key, you can't tell whether the value is unique until after the user has done all the data entry and you try to save the record. It's not a huge deal for the UI, but it's extra coding you have to write that you don't have to deal with using surrogate keys. Lastly, surrogate keys can be smaller than a natural key (depending on what type of field you use), which makes for faster querying.

Having said that, there are a number of well-respected DBA-type folks out there who rank surrogate keys just below the bubonic plague. They site your argument about having to create the unique constraint anyway, as well as the fact that the key generated is arbitrary and thus has no bearing at all on the data it represents. From their point of view, if you have to make up this key field, you haven't done proper data analysis to find the real key field(s). In addition, especially with things like identity fields, the key value assigned is completely based on the order in which the data is saved to the table, which doesn't follow from a set point of view - by definition, sets have no order. This also makes the data more complicated to copy over, since order is now important. There are lots of things database vendors have done over the years to mitigate some of the issues with surrogate keys, but in the end it's all extra SQL that has to be fiddled with.

If we go with your example, an employee may have an Employee ID assigned to them by HR, and the HR database probably does use that as the primary key. But you're also assuming that I (or my users) have access to the HR employee information, which is not as common as you might think. If I do have access, then I might use their EmployeeID as the key value. If I don't - well, I need to use something, and it may not be possible/practical to get the company's EmployeeID. Either way, I'd also probably construct the UI to use a dropdown list of employees. This shields my users from whatever ID scheme I choose.

Having said all of this, I tend to fall into the surrogate-key family more often than not. Most of my co-workers over the years have been in the surrogate-key family as well. I will say that I'm not a big fan of GUID's as my key values, even though they do help to solve the "I need the key before the database generates it" issues. Regardless of which type you use, though, they usually are simpler - it's only one field to deal with; the database usually takes care of it for you; and all tables work the same, so you can encapsulate the code in a base class and forget about it. But I have found myself using something of a hybrid approach lately - using surrogate keys for the main tables (Customers, Orders, etc.), but creating a sort of natural key in the child tables (OrderID + Line Item Sequence, for example). It's still all handled by the system, and while it does imply an order to the data, I find there's almost always some kind of order to the data anyway.

HTH

- Scott

RockfordLhotka replied on Tuesday, May 20, 2008

It is very true that CSLA doesn't care how you construct your database keys - or how you construct your database for that matter. This is a database design question, not a CSLA question.

I tend to use surrogate keys because I spent the first third of my career working with ISAM files - before RDBMS software was fast and stable enough to be useful in the real world.

In the ISAM world, using real data as a key value is very problematic if you ever allow that data to change. And even the most "unchanging" data eventually has to be changed. We had DocumentNumber as a key, for example. They'd be values like "SOP001". In theory they'd never change. Hah! They did too change - eventually someone decided to reorg the document numbers...

So we had to write and test a big one-off app to go through all the ISAM files in the entire system to find/fix all the (what would now be called) foreign key values. That's a chunk of my life I'll never get back...

In the RDBMS world you can set up referential integrity rules, at least in theory, to make editing a primary key value a non-issue. But I say "in theory" because that isn't always true outside the world of academia. It isn't always practical to have those rules set up, nor is it always practical to touch a large percentage of your database to update a widely-used key value just because SOP001 was changed to UOP001. The locking alone is a killer in a scenario like that...

All that said, I am not a database expert. Don't pretend to be one, don't want to be one. That is the realm of people like Kimberly Tripp, and I'm happy to leave it to them Smile [:)]

This is why I'm a supporter of stored procedures. They provide a logical view of the database so someone who does choose to focus on the database can more easily come by later and make things sing.

tarekahf replied on Wednesday, May 21, 2008

Thank you all for the great feedback.

I can understand the problem that I will face when the PK Value or Structure (Type/Length) will change in the real world, if I am using the Real World Key Values to propagate to related Tables. I can also understand the problem I will face when I have composit key structure, and they become part of the PK in many other related tables, and then I have to use them in every join.

This is the responsibility of a Good System/Database Desinger who should spend enough of time and effort to desing the Database Tables Key Structure/Relationship properly.

For example, in our case, the HR user told me that the Staff ID is 6 Digits Numeric. And this is the specs used in Unix/Adabas Database. But, I decided to make it Text and 7 characters long in SQL Server for related tables. You cannot imagine how this helped me later on when interfacing with different platforms. Also, the users in the real world they always use the format of "000000" when they refer to the Staff ID anytime, anywhere. So, I decided to follow the same logic and use the same format by forcing the Leading Zeros to the Staff ID in the Table Key Field. So, if the Staff ID is 23, then it will be "000023".

I will use Surrogate Keys when I cannot find a real-world key value that does make sense. For example, Process Log Table. In such table, there is no Key Value that maters to the business logic, so I always use Identity Key (Auto Increment) key value.

Also, when the Composit Key Structure becomes so complex, I will use Surrogate Key, but I will write a small program to generate it, and will never ask the Database to generate the Key for me. Why ? becuase if I ever want to enable replication on the Database, or need to perform some table migration/copy ...etc.., I think it will cause problems and create unecessary difficulties.

But I am now happy that CSLA does not force your to use Surrogate Keys.

Tarek.