Primary Keys in CSLA 2.1.1

Primary Keys in CSLA 2.1.1

Old forum URL: forums.lhotka.net/forums/t/1825.aspx


gdk9am posted on Thursday, November 23, 2006

Hi All
 
When using defining primary keys in a new database using CSLA 2.1.1, which is better identity fields or unique identifiers (GUID) ?
 
regards
Graeme

guyroch replied on Thursday, November 23, 2006

Graeme,

This issue has been reashed time and time again.  I started the following thread last year.  Read it and judge for yourself based on your assessment.

http://groups.msn.com/CSLANET/general.msnw?action=get_message&mview=0&ID_Message=22852&LastModified=4675552724867439699

Hope this helps

 

Bayu replied on Friday, November 24, 2006

For my current app I use Guids.

Even though the DB is MySQL which does not inherently support Guids or uniqueidentifiers like SqlServer.

It is simply that one of the future requirements will be that the app (winforms) should also be able to run offline. If you require the DB for getting unique keys, then this would turn in hugely complex requirement to address, but with Guids it is fairly simple.

Bayu

Gareth replied on Friday, November 24, 2006

Be aware though of the downsides to using GUIDs. If you use a GUID as your primary key and that key is also the clustered index you will get a lot of fragmentation in the database as there is no predictive pattern to them.

Also be wary of using a GUID as an index in general as 16-byte index key comparisons are also not very efficient.

Both will affect performance of the database, and if your database is large and heavily dependant on those indexes, the affect could be quite dramatic.

In addition if you are debugging stored procedures, working with GUIDs can be a bit tedious if you are having to enter them manually.

Gareth.

Bayu replied on Friday, November 24, 2006

Gareth:
Be aware though of the downsides to using GUIDs. If you use a GUID as your primary key and that key is also the clustered index you will get a lot of fragmentation in the database as there is no predictive pattern to them.

Also be wary of using a GUID as an index in general as 16-byte index key comparisons are also not very efficient.

Both will affect performance of the database, and if your database is large and heavily dependant on those indexes, the affect could be quite dramatic.


Theoretically both these statements are true, although I never conducted or read about benchmarks that actually detail the performance differences. Did you?

Gareth:

In addition if you are debugging stored procedures, working with GUIDs can be a bit tedious if you are having to enter them manually.


This is a non-argument. Whether debugging a particular choice for PK is tedious or not should never be an aspect that is included in your considerations. Especially since these procedures are very well debuggable like any other if you would use something like NUnit.

Bayu

Gareth replied on Friday, November 24, 2006

Bayu:
Gareth:
Be aware though of the downsides to using GUIDs. If you use a GUID as your primary key and that key is also the clustered index you will get a lot of fragmentation in the database as there is no predictive pattern to them.

Also be wary of using a GUID as an index in general as 16-byte index key comparisons are also not very efficient.

Both will affect performance of the database, and if your database is large and heavily dependant on those indexes, the affect could be quite dramatic.


Theoretically both these statements are true, although I never conducted or read about benchmarks that actually detail the performance differences. Did you?


I work with two Microsoft SQL Server consultants and I was discussing the use of GUIDs as primary keys with them a few days ago after reading chapter 6 of Rocky's book. They both agreed that there is a definate use for them in specific business cases and where the unique identifier is useful up front such as working with parent/child relationships. However they did emphasise to me if the database requirements were highly performant then they would never use GUIDs as indexes or primary keys due to the reasons I mentioned above. We work a lot with such databases processing several million rows every day in specific timescales and so it wouldn't be appropriate to use GUIDs in that situation. However, I will be using CSLA.NET for other projects where such database performance is not as high, although it does need to be timely. I was simply pointing out that there are downsides to using GUIDs, but not that you shouldn't use them.

Bayu:
Gareth:

In addition if you are debugging stored procedures, working with GUIDs can be a bit tedious if you are having to enter them manually.


This is a non-argument. Whether debugging a particular choice for PK is tedious or not should never be an aspect that is included in your considerations. Especially since these procedures are very well debuggable like any other if you would use something like NUnit.

Bayu


I am not saying that this is a reason not to use them, or that it is a point to argue about. I am just saying that in some situations they can be awkward to work with. For example, in the environment I work in at the moment we are debugging quite complex, large stored procedures that contain a lot of logic and business rules. If we were using GUIDs in these stored procedures it would make our job a lot more difficult.
Not everyone has access to NUnit. Some controlled working environments only allow the use of approved software (in my current case Sql Management Studio and Sql Profiler only).


If the business case justifies using them... use them :)

Gareth


Bayu replied on Friday, November 24, 2006

Hey Gareth,

I am very much aware of the limitations of Guids when performance is an issue. I was just hoping you might have some resource where the performance difference is benchmarked so that we could get an idea of what amounts of losses we are actually talking about. I would be curious, what kind of queries suffer most? How much do they suffer? Twice as slow? Ten times as slow? It would also be interesting to compare SqlServer who has native support for uniqueidentifiers with e.g. MySQL which doesn't so would use a character or binary field.

But I guess I should perhaps do such a benchmark myself, eh? ;-)


Regarding your second note: I had never thought developers would be restricted in which tools they were allowed to use. Really surprises me. Of course I understand that there are rules in place that tell you what framework and UI toolkit to use, how to tackle common scenarios and stuff like that, but that someone wouldn't be allowed to use NUnit had never occured to me.


Regards,
Bayu

Gareth replied on Friday, November 24, 2006

Bayu, I'll try and chat with one of my colleagues and see if he has any benchmark or performance statistics regarding this, or some kind of reference. If he has, and I am allowed to post it up here, I will.

Yes in a lot of large organisations I've worked for (mainly large corporates) its like that. Most software has to go through an evaluation and testing period before it can be approved. It can be frustrating at time because you don't always have access to the latest stuff, especially when I have my own MSDN Subscription at home with all the latest tools and software.

Going off topic slightly...  luckily where I am now they've just approved .NET 2.0 and VS 2005, and even .NET 3.0 is already under evaluation due to some of the departments wanting to use WCF. So that's all good. On the flipside, the OS's here are all Windows 2000 and only recently have they started rolling out Windows XP to the staff!!

Gareth.


Vinodonly replied on Friday, November 24, 2006

Bayu, regarding performance issue and fragmentation realted with GUID, there is a detailed demonstration in webcast series by kimberly tripp (10 Part series), she has also mentioned about new Sequential guid's...

A Primer to Proper SQL Server Development

pls chk on her site or on msdn for exact link..

 

Bayu replied on Friday, November 24, 2006

Vinodonly:

Bayu, regarding performance issue and fragmentation realted with GUID, there is a detailed demonstration in webcast series by kimberly tripp (10 Part series), she has also mentioned about new Sequential guid's...

A Primer to Proper SQL Server Development

pls chk on her site or on msdn for exact link..



My god!

Now that's a vast resource!

Thanks for the tip, I will certainly have a look later, perhaps this weekend. It's not like you can skim through it in a glance. ;-)

Bayu

Gareth replied on Friday, November 24, 2006

Nice reference Vinodonly!

I managed to skim through a fair bit of that and found some references later on to UuidCreate and then also UuidCreateSequential.

UuidCreate is already implemented in .NET as Guid.NewGuid() from what I can see. But UuidCreateSequential is not.

UuidCreateSequential would appear to create new sequential GUIDs which would go some way to solving one of the above downsides of not having sequential GUIDs and causing lots of page fragmentation in Sql when inserting.

C# code to call UuidCreateSequential...


using System;
using System.Runtime.InteropServices;

class Program
{
    [DllImport("Rpcrt4.dll")]
    public static extern int UuidCreateSequential(out Guid guid);

    static Guid NewSequentialGuid()
    {
        Guid newGuid;
        int i = UuidCreateSequential(out newGuid);
        if (i != 0)
          throw new ApplicationException("UuidCreateSequential failed");
        return newGuid;
    }

    static void Main(string[] args)
    {
        for(int i = 0; i < 20; i++)
            Console.WriteLine(NewSequentialGuid());
        Console.ReadLine();
    }
}

Important Note: The GUIDs generated by UuidCreateSequential() are unique only within a particular computer if the computer does not have a network card.

If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID, and therefore access data associated with that GUID.


I think I'll now be happier to use GUIDs in my CSLA.NET code with this.

Rocky, is this worthy of inclusion in CSLA.NET ?? ;-)

Gareth.

xal replied on Friday, November 24, 2006

Or read the thread guyroch pointed you at in the second message of the thread and avoid all this discussion... :)

Anyway, if you read it, you'll find a link to an article that:

Andrés

Lakusha replied on Friday, November 24, 2006

I have seen and done benchmarks with guigs, but have none on hand. It makes a difference when you have hundreds of concurrent users or if you have really huge tables. Basically, it is a problem related to memory pressure and increased IO (disk and network). 64bits helps a lots when working with guids.

Consider using guids if:

  1. you want to use replication (it helps quite a bit)
  2. you have offline or near-offline applications that sync with the main db

Outside of those 2 puposes I see them more like a royal pain. When you create an Order or a Client, everybody undestand Client #114. Useful in reports, Invoices and even on screen. have you ever tried printing an Invoice with Client #5752ECA5-3427-48E4-8595-C56D488CCEF3 ? Try asking a client to spell it on the phone when you have a support call.

When debugging a report with a 20 tables join, try following those guids around to understand the data on table with millions of rows. I work with and on machines, but I am still human (whatever others say :-) )

guyroch replied on Friday, November 24, 2006

Performance, performance, performance... the never ending story.

The truth is Guid and Idendity PK's are both excellent solutions and there's no real argument to choose one over the other as they both have pros and cons.

Below you will find a link to one of the best review I came accross in the past regarding this issue.

http://www.informit.com/articles/article.asp?p=25862&seqNum=1&rl=1

Enjoy...

Gareth replied on Sunday, November 26, 2006

A good article but it doesn't address the main issue of SQL Server page fragmentation when using GUIDs which is really the primary draw back of using a GUID as a primary key that is also clustered.



Skeeboe replied on Sunday, November 26, 2006

The second version listed on this page is the code we use to generate COMBs.

http://www.jnsk.se/informit/pkguid/followup.htm

7425e88d-58bc-4a0a-976d-98850082040c
57648e9f-bd9d-4bfc-8818-98850082040c
1cc75d80-7e3e-4a69-b954-98850082040c
c94eaec8-62d0-404f-95a7-98850082040c
9ba6eae8-149c-4b66-83c2-98850082040c
33676a3e-40ea-40ab-9630-98850082040c
c88ca613-f097-4bf3-af57-98850082040c
efb29e06-d318-4dab-8a80-98850082040c
1339d4b0-7075-44ad-9e77-98850082040c
869f7416-a161-4c2e-ae4c-988500820411
61ae8149-515f-4a60-8499-988500820411

The date/time part is divided by 3.33333 to match SQL server's time format. Sql server tracks down to 1/300th of a second where as .NET tracks down to 1/10000th of a second!

Visually they don't look like they would index correctly, but sql server loves them and they are fast!
http://www.informit.com/articles/article.asp?p=25862&seqNum=7&rl=1

JonM replied on Sunday, November 26, 2006

I've been using the COMB ID, a sequential guid.  I took the code and implemented it as a helper function in my business object library.  Then I also used the code to create a SQL Function written in VB.net via the .NET 2.0 CLR in SQL Server 2005.  It seems to work great!

The Cost of GUIDs as Primary Keys
http://www.informit.com/articles/article.asp?p=25862&rl=1

DansDreams replied on Monday, November 27, 2006

"I work with two Microsoft SQL Server consultants and I was discussing the use of GUIDs... They both agreed that there is a definate use for them in specific business cases and where the unique identifier is useful up front such as working with parent/child relationships. However they did emphasise to me if the database requirements were highly performant then they would never use GUIDs as indexes or primary keys due to the reasons I mentioned above. "

And that, my friends, is why you should never take somebody's word for it because they hang "consultant" after their name.  That statement is about as useless as when somebody tries to tell you stored procedures are more performant because they're precompiled... ah, but that's another discussion.

I've been one of the champions of GUID PKs here and (especially) on the old forum. 

Since it seems we'll keep having these same discussion over and over, even when there's a link to the previous ones provided... here's my two cents worth (again).

As was mentioned, there are fairly simple ways around the non-sequential issue.  Like Jon, I personally create a sequential GUID in my business layer code so I can create the PK when the object is created.  A simple piece of code and I have the sequential but globally unique PK before saving.  The famous article by Jimmy Whatshisname about COMB Guids had some benchmarks and demonstrated that sequential GUIDs suffered no significant performance issue.

Also, SQL 2000 and later has features to get around the replication issue for standard integer IDs.  There's two different mechanisms as I recall, but they basically guarantee that any write partners will have separate ranges of integer IDs to assign to new rows.

I personally still like GUIDs because I think it's a simpler solution to that problem and could provide some potential benefits somewhere down the road, but it's really probably just personal taste for 95% of the applications we're writing.

xAvailx replied on Monday, November 27, 2006

>> However, I will be using CSLA.NET for other projects where such database performance is not as high, although it does need to be timely. I was simply pointing out that there are downsides to using GUIDs, but not that you shouldn't use them. <<

Just to clarify, CSLA does NOT require GUIDs, it just happens the author used GUIDs in the example project.

HTH

pelinville replied on Monday, November 27, 2006

 

Nobody talks about the benefit of GUIDs with regard to identifying the object in the object model.

 

Ignoring the database (how dare I!) and performance, having each and every object have a ID that is globally unique makes some things wonderfully easy.

 

One possible thing is using something besides a DB for persistence. Theoretically you could do a shallow serialization of each object and save it using the GUIDs string representation.

 

I use a OO database and it's use is made much easier because each object has an ID that is a guid.

 

Setting up things like caching and object tracking becomes very easy (make shared hash table with the GUID as a key and you are set for simple things.)

 

Asynchronous calls, services and messaging are often easier and can be more efficient, I have found. (Many times you can just send the objectID and a couple of fields reporting the result.)

 

And as far as the DB does go.  Performance is only really affected by inserts.  If you don't do tons of inserts then no problem. 

Copyright (c) Marimer LLC