SQL Server MVP Deep Dives 2, Chapter 4
I believe chapter four of Deep Dives 2 should be required reading for all non-database developers. It is not only a great overview of what makes up a good database design, but speaks to why it is best to design things in a relational way. Data architects don’t sit around plotting how to make life difficult for developers (that is just a side benefit), but they design a SQL Server database to make the best use of the relational database engine. Davidson lists out seven criteria for good database design with the added requirement that “it works.” “It works” is sort of the bare minimum for a database and it needs to be kept in mind or all of the effort of designing the database is meaningless. When I read chapter four I was reminded of the Boy Scout Law listing the characteristics scouts strive for. I took Davidson’s criteria and wrote it out as Louis’s Law.
A Relational Database is Coherent, Normal, Fundamentally Sound, Documented, Secure, Encapsulated, Well Performing, and it works.
A Database is Coherent. This means exactly what it says, which is that the database as a whole is easy to understand. Davidson breaks this down further and says that for a database design to be coherent it must be standards based, have reasonable names and data types, be cohesive, and, most importantly, need little documentation. I’ve worked in environments with standards that are written down and standards that are just “based on common sense.” The written down standards are more annoying to follow, but help to create those reasonable names. Davidson’s point on this is that bad standards that are followed by everyone are better than great standards that no one adheres to. The importance of correct data types is theme I have come across in each of the design chapters so far. He describes databases where all of the data types are set up as varchar(200) or varchar(max). This will work to store data, but everything will be stored as character data and you will lose many of the advantages of using the correct data type. Even simple arithmetic will have to convert data to a numeric type before it can be performed which will make for slower queries. I think Davidson’s point about needing little documentation resonated with me more than anything else in the chapter.
Toasters toast the same way; even my Hamilton Beach combination toaster/toaster oven didn’t take more than 15 seconds to figure out. … Give me too much information and I’ll ignore it. Why don’t I read the toaster instructions? Because it’s 20 pages that mostly tell me stuff that a 4-year-old Siamese cat could figure out. Sure, there’s probably a feature that I’m not immediately taking advantage of because I don’t read the docs, but I’ll risk that to avoid being told one more time not to use the toaster while taking a shower. – SQL Server MVP Deep Dives 2, Chapter 4
I found myself immediately repeating this explanation to colleagues and even to my family. It explains the concept of overdocumenting so simply. It’s not that documentation isn’t needed (it’s number four in Louis’s Law), but if the design is done well then there won’t be the need to document as much.
A Database is Normal. Davidson gives a quick definition of what Normal means and then rightfully admits a discussion of normalization would require its own chapter. His basic explanation of a normalized database is that there is “no duplicate data” and every piece of data has a “single purpose”. He stresses again that having data in a normalized fashion helps the relational database perform at its best. SQL Server expects the data to be in normalized relational tables and has less to calculate for query optimization when this is true.
A Database is Fundamentally Sound. Another way of stating this might be to say that there are rules in place for data integrity. I suspect he may have avoided that phrase on purpose because there can be preconceived notions of what data integrity means. That phrase gets thrown around a lot at companies when there is bad data, but discussions on bad data can sometimes lead to the blame game. Davidson points out that it makes sense to do data validation in the database when possible. If you push data validation off to the application layer then it will have to be recreated whenever the data is accessed in a different manner. He asserts that uniqueness should be enforced using a UNIQUE constraint on the table. This ensures that no matter what application or process is adding data to the table it will have the correct level of uniqueness. I’ve worked with many third party databases that assume you will never access the data outside of their application. This can lead to some interesting data which can cause the application to throw an error. I’ve seen this exact uniqueness issue in one application I worked with. Using the application every employee record had a unique EmployeeID, but we discovered when loading data in bulk that you could create two employees with the same ID. We had to make sure to account for this in all of our processes.
A Database is Documented. Despite the previous admonition to create little documentation, Davidson makes the very valid argument that documentation is needed. He acknowledges that it can be “as boring as a whole oilfield of derricks,” but it needs to be done to explain items in the design that are not self evident. His advice is to document first before you starting building things. His point is that documentation is about communication and if you document everything before you build then everyone is on the same page – or possibly not, which is better to know at the start.
A Database is Secure. If normalization requires its own chapter then security would require its own book. His goal with this characteristic seemed to be to put a slight scare into the reader about what can happen if security is ignored. People are growing more and more concerned with data privacy and even if you are in an unregulated industry your customers will not be happy if data about them is shared inadvertently.
A Database is Encapsulated. The concept of encapsulation is not new to the programming world. In database terms Davidson defines encapsulation as, “Changes to the structures should cause minimal impact to the user of the data structures.” An example might be adding a new column to a table. The table structure has changed, but this should not break any code that is accessing the table. With the technologies available today this quickly becomes a discussion of using a layer of stored procedures to access the data or an Object Relationship Mapping (ORM) tool. Coming from the database side of things I’ve only used stored procedures for encapsulating the data access. While admitting there are times when an ORM makes sense Louis tactfully outlines reasons why you might not want to use an ORM. His biggest reason is that experience has taught him a database designed for an ORM tool is usually poorly normalized; leading to poor performance on large data sets.
A Database is Well Performing. Louis boldly claims that this is the least important characteristic – “just don’t let the users know.” He then lists off items to be considered when looking at performance. The first item circles back to the point of the chapter in general. If the initial design is done correctly the database should perform well. The other items are concurrency, indexing, hardware, and good querying code are all pretty straight forward. Adding the correct Indexes and increasing hardware are hopefully the easiest fixes to understand while creating the correct locking for multiple users (concurrency) and writing good queries are more subtle. I have been writing T-SQL queries for over a decade and I am still learning new ways to make a query run faster. Many of the chapters in Deep Dives 2 talk about ways to increase performance so this is a topic I’ll be revisiting.
A Database Works. In the summary Davidson comes back to the point that all of this is done to create a database that works. But that isn’t the end of it. After the database is created people are going to start using it and your real job begins. The bonus is that if you have designed the database well the task of administering the database should be easier. Louis refers to “future you” several times in his chapter and that creating a good database design from the beginning will help whoever works on it next. It could be future you, so don’t let him or her down.
Chapter Four SQL Server MVP:
Louis Davidson (B|T) has been in the IT Industry for 17 years as a corporate database developer and architect. He has been a Microsoft MVP for seven years and has written four books on database design. Currently he serves as the data architect for the Christian Broadcasting Network, supporting offices in Virginia Beach, VA and Nashville, TN. Louis has a bachelor’s degree from the University of Tennessee at Chattanooga in computer science with a minor in mathematics.
If you are interested in reading SQL Server MVP Deep Dives Volume 2 you can acquire a copy at Manning Books. I promise it is less boring than an oil derrick. I don’t get anything out of it if you buy the book, but you might.