Database ID: Int vs. BigInt vs. GUID

I’ve been hearing a lot of talk about using a GUID as a database row ID, in recent months… last night I was talking with Jeffrey Palermo about this, specifically, and he brought up some interesting points about decoupling the ID of a record from the database implementation. We also talked briefly about scalability of Int vs. BigInt (Int64) vs. Guid, and how any type of replication for scalability would cause the need for a GUID identifier, anyways. Another good point he made was that getting away from a simple number for an ID often helps facilitate discussions of a “natural key” for database records – such as a customer account # that the business assigns – instead of relying 100% on a database generated key.

Then this morning, I round this article: http://nhforge.org/blogs/nhibernate/archive/2009/03/20/nhibernate-poid-generators-revealed.aspx, which basically says that we should avoid using “Native” (SQL Server) or “Sequence” (Oracle) ID generators with NHibernate, and goes into some good detail about why.

So, I wanted to toss this idea out to the world and ask, what are your thoughts? What Primary Key data type do you prefer, and why? What are the benefits and drawbacks of using a GUID, over your current favorite?

I’m beginning to lean toward using GUIDs, personally…


Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Derick Bailey

Derick Bailey is an entrepreneur, problem solver (and creator? :P ), software developer, screecaster, writer, blogger, speaker and technology leader in central Texas (north of Austin). He runs SignalLeaf.com - the amazingly awesome podcast audio hosting service that everyone should be using, and WatchMeCode.net where he throws down the JavaScript gauntlets to get you up to speed. He has been a professional software developer since the late 90's, and has been writing code since the late 80's. Find me on twitter: @derickbailey, @mutedsolutions, @backbonejsclass Find me on the web: SignalLeaf, WatchMeCode, Kendo UI blog, MarionetteJS, My Github profile, On Google+.
This entry was posted in Data Access, Principles and Patterns. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://mynerditorium.blogspot.com Daniel Auger

    I love guids for ID in a perfect world, and they are my favorite. Unfortunately fragmentation can occur in the database when using non-sequential guids. That’s what Guid.Comb is for right? Well what happens when your data can enter the database from several applications on different servers and from ETL? In this case Guid.Comb breaks down unfortunately. Sequential Guid in SQL server can be used to fix this problem, but then we are back to the same performance issues we have when using native ID in SQL Server with NHibernate.

    The next best is HiLo IMHO. However, it’s pretty difficult to convince a DBA that their SSIS package is going to have to call a HiLo ID generator function to get its ID.

    What I’d really like to see is some form of distributed Guid.Comb that doesn’t have any serious performance implications.

  • Eyston

    I am stuck in a database that uses natural id’s (and composite natural id’s) and I think it sucks hard.

    I write a few side projects and used guid.comb. The use case I was doing consisted of a huge batch of inserts (migrating data) which resulted in a ton of records having matching times in the guid.comb id, which is unfortunate.

    I was thinking to try hi lo next. I like the idea of having something that is human readable as the identifier. Like if I have an order, it would be nice to have an int32 number identifying the order that people can use in passing.

  • http://www.lostechies.com/members/jcteague/default.aspx jcteague

    I use the HiLo almost extensively. The only times I use guids are when I’m in the situation Jeffrey talks about. So I stick to using the simplest thing that works.

    I also think that it’s important to remember that not all entities in your domain are created equal. While it may seem nice and neat to use the same type of identifier for your system, it’s a fallacy that it will make your system more robust. Using the same identifer without regard to your systems needs can create unnecessary complexity elsewhere.

    I’ve also worked in natural key databases and I concur: it sucks.

  • Buchanan Dunn

    Using Guids as a pk for your db works fine if you need to get the id from your application which may or may not know the last record Id used and you need to know what it will be after it is created without having to wait for a response from the db.
    The one thing to be careful of is that if using guid’s as pk make sure you add a column that is sequential to index against to avoid fragmentation.

  • Eyston

    Was thinking about the Customer Number thing… how would you do this without replicating the badness ™ of database generated identities?

    When the end user is creating a new customer, they probably don’t want to specify the number. This means the business logic comes up with the number. You are going to want this number to be unique, and in order to guarantee uniqueness wouldn’t you have to get the database involved?

    Or is there a better way of generating a unique number for each customer that won’t collide and doesn’t require a database trip?

  • Eyston

    To clarify I mean in the example that you would have say a guid.comb id for the Customer record, but also want a second column, Customer Number, that is guaranteed unique with a human readable and convenient number (ours are an 8 digit number for example).

    Having a unique number like that is too useful to get rid of — you can’t recite a guid over the phone, and trying to pass multiple criteria just to narrow down to a single result seems ugly — but having an ORM generated id is also very useful.

    HiLo seems to reconcile these pretty well, but I am curious if there is a way to do guid.comb and fulfill this use case as well (or is trying to make guid fit all cases just as bad).

  • http://blogs.imeta.co.uk/sstrong Steve Strong

    If you’re doing any sort of replication / merging, then GUIDs are the way to go. If you’ve just got a single database, then you will probably get better performance and user friendliness from a hi-lo approach. If you do go down the GUID route, I’d definitely aim for some form of sequential algorithm.

  • http://tunatoksoz.devlicio.us Tuna Toksoz

    There is a Guid Comb alternative proposed in NHibernate development group, which also addresses uniqueness of Guid across multiservers

  • http://kevhunter.wordpress.com Kev Hunter

    Is this not a case of YAGNI? If you will be doing replication then your obviously going to need GUID’s but if not they can be a pain to work with, not just from an indexing point of view, but having a surrogate or natural key which can be easily read and remembered is often useful too.

  • http://mynerditorium.blogspot.com Daniel Auger

    @Kev – There are many patterns where it’s useful to assign an identity to an object long before it gets persisted to the database (such as with unit of work). Guids and HiLo make this a possibility.

  • http://www.lostechies.com/members/bogardj/default.aspx bogardj

    Guid.Comb is the way to go. That article only says “dont rely on the DB to generate IDs”, not “don’t use GUIDs”.

  • Jim Raden

    Can someone define HiLo or point to a reference? Wikipedia’s got nothing. (Though I have learned a lot about Hawaii!)

    Thanks!

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey
  • http://www.lostechies.com/members/jcteague/default.aspx jcteague

    @bogardj IF you need a guid, which is not always the case.

  • Alex

    I actually come from a DBA background prior to extensive nHibernate work and can provide a slightly different perspective. As a dba, I ALWAYS quake whenever the database is not controlling important datapoints such as unique IDs, creationdates, etc. The more successful a project is, the larger it will become and the more likely that some other project will arise that will not take advantage of nHibernate. At that point, mayhem often breaks out and collisions and dirty data are the result as each project is then forced to design their own ways to populate these important datapoints. I would much rather force the nHibernate developers to use database generated GUIDs (any decent dba can properly manage indexes to lessen the effect of index fragmentation) and have them include fetches after the save.

  • http://www.lostechies.com/members/chadmyers/default.aspx chadmyers

    I don’t see any reason to not use GUIDs. It’s a GLOBALLY, UNIQUE, Identifier. Why use anything less and deal with all the potential problems?

    Indexing is an ancillary concern and there are ways to alleviate that.

    @Alex: There may be more than one database involved with different schemas representing the same entity (i.e. OLTP, various denormalized reporting schemas, ready archives, offline archives, etc).

    You absolutely need something that is *NOT* database generated because then you’re assigning too much power to one particular database.

    GUID is easy, as soon as an entity is conceived (regardless of its current persistence status), it has an ID. That ID (GUID) remains with it forever and wherever the data that goes along with that entity may reside and in whatever format that data may need to take (database, XML, JSON msg, etc).

    With GUID there is no collision and never will be. With anything else, there is a good if not guaranteed chance of collision (i.e. Int32/Int64). Also sequences tied to a particular database server instance while be a huge problem for any medium or larger project, unless you plan on making the mistake of trying to do everything on the OLTP database.

    Indexing, clustering, etc are all separate problems that can be resolved in other ways. Completely compromising and hobbling your entity ID generation mechanism because of some implementation details in the RDBMS seems silly to me.

    IMHO, using Guids is far less complicated than using identities/sequences. NHibernate takes away a lot of that complexity, but it’s still there if you have to do anything more than CRUD against that database.

    GUID’s all the way (using GUID.Comb) and using alternative methods for dealing with clustering and fragmentation is the way to go.

  • http://thinkbeforecoding.com Think before coding

    I usually use Guid and generate them in the code.
    This way, objects (entities ?) get their unique identity in the system when instanciated. It helps in terms of scalability because it removes a strong bottleneck from the system.

    @alex, your point of view is the one of a reference database where the database is the source of integrity, it is rapidly a great cost for scalability. Lots of people are moving to a schema where the database is only a smart storage used by the code.

  • http://tunatoksoz.devlicio.us Tuna Toksoz

    Alex, if you were also talking about HiLo,
    it can be implemented on db as a SP and you can make use of it on NH side, too.

  • Eyston

    I still am wondering if there is a good solution of how to use guid.comb while keeping a human readable reference number (or something similar).

    To answer the question over the phone: “which order?”

    I would want something easy to say, like “order 1097820″, not a guid. I want cake and icing :)

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    @Eyston,

    The concept of a natural key (customer #, order #, etc) is orthogonal to the primary key, in my mind. I’m not a fan of using natural keys as primary keys, for various reasons (which is another discussion entirely). I like to create a table with 2 columns, in this situation: one column is the primary key (guid, int, whatever) and the other column is the natural key (customer #, order #, or whatever)

    you should continue creating and using the natural keys as you do, today. let the business decide what the key values are, and let the business plug them in how they want to. if this is generated by code, great! if it’s generated by humans, great! you only need to ensure that you have an unique constraint / index on your natural key, if you are going to be searching by that key.

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    All,

    a coworker responded via company email. i thought his response was worth sharing. you can read it here:

    http://www.lostechies.com/blogs/derickbailey/archive/2009/07/15/storage-size-and-performance-implications-of-a-guid-pk.aspx

  • Eyston

    @Derick

    I totally agree with the first half — I’m not saying Order 1097820 should be the PK. I want a guid PK, but I want to keep a unique number to be able to reference it by for humans. Like you say a unique constraint on that column in db.

    What sucks is that the benefits of a GUID (or any generated id) is that the ORM can generate the field and save round trips and allow batching. BUT when I now want a seperate unique key as an integer for humans, I have no idea how to generate that WITHOUT somehow working with the database (I’m not saying its impossible, just I have no idea how to do it).

    I am wondering if any tips/tricks exist for this: how to generate a unique (non-PK) key for an entity w/out requiring that extra database trip to either (a) generate the key or (b) verify that it truly is unique.

  • Eyston

    Also, I’m not 100% sure I would qualify order # as natural key: it has absolutely no business meaning other than as a unique identifier and no one ever keys it in. Maybe that still makes it a natural key, I dunno.

    We have some tables at work that deal with parts. A part number is the key of that table. This has caused a ton of problems. The thing that makes it suck is (a) users actually key this in and (b) a part number has a ton of meaning outside of the database. The part number, once entered and used, no matter how wrong or how many good reasons there are to change it, can no longer be changed without breaking all FK relationships. It sucks.

    What sucks even more is that natural key such as this (part number) just lead to composite key (serial number? oh that is unique per part …. so serial table is PK of serial # and part #… ugh).

    Sorry for just thinking out loud here — it is just that at first glance it is absolutely clear to me that guid.comb is wonderful… decision done, one less thing to worry about, check mark! But now I am doubting or questioning at least.

  • Dan Malcolm

    @Eyston

    Assigning a human-friendly identifier to entities (“order number”, “customer number” etc) is a separate business requirement and doesn’t have to inform or conflict with your choice of PK data type. Bear in mind the requirement could easily extend to something other than a sequential integer, e.g. some kind of “meaningful” prefix.

    This requirement isn’t that awkward to implement. You have some kind of strategy – IOrderCodeGenerator.Assign(Order order) or something – to generate the codes. A database will have to come into play to generate the sequence, but the hi-lo algorithm “reserves” a number of keys in a sequence within a single database call. This approach won’t interfere with batching at the ORM level.

  • Eyston

    @Dan

    Thanks for reply.

    That makes a lot of sense (the OrderCodeGenerator). I guess I was blinded to the idea that the hi-lo is an algorithm, not just an option in NHibernate :)

    So that OrderCodeGenerator would not only abstract away the process but if it implemented the hi-lo algorithm would remove the database lookup per order creation (although it definitely could have a db request if done lazily on the first one, or if it ran out of lo’s, etc).

    I dream about one day not having my crappy db, and that was the latest thing on my mind of “how would I do that” :)

  • http://www.lostechies.com/members/derick.bailey/default.aspx derick.bailey

    (NOTE: adding this comment on behalf of Bernhard Kircher. He couldn’t get the comment to post, for some reason, so he emailed it to me)

    I just wanted to add some information on Guid generation within Code. As mentioned in the article on NHibernate, Guid.NewGuid() is not the best idea because of fragmentation, and therefore causing more IO and so on. The Guid.Comb seems to be a good idea (depending on it’s implementation). I just wanted to add, that if you are using MS SQL server, you can generate the Guids in your code the same way the server does when calling newsequentialid(). It uses the rpcrt4.dll. The generated guid has to be edited, because of the way sql server is sorting them. I added more information here (with more links to my resources)
    http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid/1042719#1042719

    best regards,
    Bernhard Kircher

  • http://geekswithblogs.net/LessonsLearned Randolpho

    I think there are valid reasons *not* to use GUIDs… the biggest being the loss of a clustered index on your primary key, which can be a significant performance hit unless you spend a *lot* of time fine-tuning your DB.

    That said, from an architectural point of view, I vastly prefer GUIDs to IDENTs for my primary keys. I discount the most common argument against GUIDs, which is the whole “what if I’m trying to troubleshoot the DB and I have to type in this whole big nasty GUID rather than a simple WHERE ID=12345?” argument. COPY AND PASTE, people!

    The people who say “well, what about the ops folks being able to say “that’s an issue 144′?” I just want to smack upside the head. USE NATURAL KEYS, people!

  • http://www.indiecodelabs.com Enrique Ramirez

    Ok so if I understand correctly, I’d use a GUID to ensure that my object/entity is unique. But, as far as joining data together between tables, I’d keep on using other keys/columns, etc?

    I know seems like I’m looking at the problem from the DB’s perspective, and that if I’m using an ORM I probably shouldn’t even care about this. I just want to make sure I actually get all of this.

  • Bernhard Kircher

    @Randolpho: Can you explain why you loose the clustered index on the primary key, when using a Uniqueidentifier? Does this apply only to a specific Database engine or in general?

    Thanks,
    Bernhard Kircher

  • Robinson

    Joining this discussion a little late I know. I thought about using an integer primary key and then also having a unique GUID (generated by the DB) for the purposes of replication, in each table too. I’m not sure how this would play out. Any thoughts?