SQL Server MVP Deep Dives 2, Chapter 1
The first Deep Dive is a no holds barred write-up by Ami Levin on the controversial subject of Natural keys vs Surrogate keys. Which was a great read for me because the first thing I learned was that there is a controversy in the database world over the use of natural keys vs. surrogate keys. The chapter starts out explaining what the difference is between the two and then outlines the benefits (and limitations) of each one.
The difference between the two types of keys is pretty self-evident. An artificial, or surrogate, key is one that is created to uniquely define a record (like employeeID or productID) and a natural key is something that defines a record using the data that is already there (Levin uses a VIN number or a country name as examples). My ignorance of this issue as a controversy can possibly be overlooked because, as Levin points out, almost everyone uses surrogate keys. I just thought that was how it is done.
The overwhelming majority of database designs that I’ve encountered use artificial sequential enumerator keys such as IDENTITY for just about every table in the database. College classes about databases, books about database development, web articles, and most other materials include examples that use artificial keys almost exclusively. I believe that this practice is so widespread that new students, learning to become database developers, take it as a fact of life that this is how it should be done. – SQL Server MVP Deep Dives 2, Chapter 1
It’s clear by the end of the essay that Levin is in favor of natural keys. I was pleased that I understood all the points Mr. Levin made and he almost convinced me that natural keys are the best thing since sliced bread. He sees the advantages of using an artificial key as a necessary evil in some situations, but describes the elegance gained by using natural keys as something to be striven for.
He gives equal voice to both sides and lists the advantages of artificial keys as never-changing, performance-improving, simpler to work with, and he admits that there are times when an artificial key is useful. The two points he argues against are that sometimes there is no other key to be found and that artificial keys take on meaning as real data after being used. He claims that if you cannot uniquely define something with its own data then maybe you need to look at the design. Having worked on a lot of databases where the main object is a person (employee, contact, etc) I believe you cannot define a person without assigning an artificial key. Sure, you could use their fingerprint or SSN as a unique identifier, but privacy issues aside that is just not practical. I would like to have seen more discussion on this point and how he has gotten around it. Indeed perhaps he hasn’t and, much like he advises the readers, he merely starts his design using natural keys and then expands where needed to include artificial keys.
He lists some of the advantages of natural keys as having business meaning, requiring fewer joins when querying (I liked this one), and better data consistency. The last point was the weakest for me and he admitted that even if you are using an artificial key a good design will still have unique constraints conforming to real world logic. For example you may define each country with a number (1-USA,2-Canada,3-Australia,etc), but you would still add a constraint saying you can’t have two Canada’s. His point is that using just the values (USA, Canada, Australia) as keys enforces this automatically. I thought the most interesting point in favor of natural keys is that they assist the query optimizer. Explaining what that means is beyond the scope of this review, but maybe I will write a blog post on it sometime if I can do it without stealing too much from Levin’s write up. [Looking ahead I’ll probably try to talk about it when I review Grant Fritchey‘s chapter on parameter sniffing.]
I really enjoyed this first chapter in the Deep Dives book. The level of explanation made it easy to read and understand, but it wasn’t stuff that I already knew. I don’t do a lot of database modeling in my current role, but when the author starts out by referencing Edgar F. Codd’s A Relational Model of Data for Large Shared Data Banks it speaks to my inner math geek. The next time I am in a position to model a database I know that natural keys will be more in my mind as I go through the process.
I think I am going to learn a lot this year.
Chapter One SQL Server MVP:
Ami Levin (B|T), CTO and co-founder of DBSophic, has been a Microsoft SQL Server MVP since 2006, and has over 20 years of experience in the IT industry. For the past 14 years Ami has been consulting, teaching, and speaking on SQL Server worldwide. Ami leads the Israeli SQL Server user group, moderates the local Microsoft support forum, and is a regular speaker at SQL Server conferences worldwide.
If I have interested you in SQL Server MVP Deep Dives Volume 2 both the Natural and Artificial versions can be ordered at Manning Books. I don’t get anything out of it if you buy a copy, but you might.