Data Modeling Tip for DBAs - by John Sterrett

Posted by on in Data Modeling & Architecture

Database Administrators (DBAs) are given so many different tasks from creating maintenance plans, upgrading databases, fighting performance problems, and more. With so many tasks to be completed, some tasks tend to not get the attention to detail they deserve.  Commonly, I see DBAs making multiple mistakes with table design and data modeling.  Today, I am going to quickly show you an example where poor data modeling can directly impact your application’s performance.

We are going to take a look at primary keys, foreign keys and their relationship to performance with SQL Server.  In this blog post we are going to utilize the AdventureWorks sample databases. This is a free product sample database provided by Microsoft that can be downloaded here.

Many database administrators do not realize that an index is not required for columns used in a child table for a foreign key relationship. Any column with the same datatype could be used to match to the primary key in the parent table. Therefore, indexing the columns referenced is commonly skipped and will impact performance.  We are going to walk through an example where surrogate keys are used for primary keys in a parent child relationship. This will cause the lookup column in the child table to be overlooked for indexing – this would have been caught with proper data modeling.

To set up the demo, we are going to create two new tables using the existing SalesOrderHeader and SaleOrderDetail tables in AdventureWorks. We are going to create two new tables for this demo named SalesOrderHeaderDemo and SalesOrderDetailDemo as done below.

 

SELECT *

 INTO  [dbo].[SalesOrderHeaderDemo]

  FROM [Sales].[SalesOrderHeader]

/* (31465 row(s) affected) */

 

  SELECT *

  INTO [dbo].[SalesOrderDetailDemo]

  FROM [Sales].[SalesOrderDetail]

 /* (121317 row(s) affected) */

 

Next, being rushed to release our changes, we are going to skip the process of doing any data modeling and just create our primary keys for each table using surrogate keys. We will create the foreign key between the two tables so we couldn’t have any sales order details without an existing sales order header record. In fact, let’s go ahead and make the foreign key delete on cascade so a delete on our parent table will also delete the child table’s matching records.Webinar-_Data_Modeling_for_the_DBA,_Sept_10_159x228_Watch_Now

 

ALTER TABLE [dbo].[SalesOrderHeaderDemo]

ADD CONSTRAINT PK_SalesOrderHeaderDemo_SalesID PRIMARY KEY CLUSTERED (SalesOrderID);

GO

 

ALTER TABLE [dbo].[SalesOrderDetailDemo]

ADD CONSTRAINT PK_SalesOrderDetailDemo_SalesOrderDetailID PRIMARY KEY CLUSTERED (SalesOrderDetailID);

 

ALTER TABLE [dbo].[SalesOrderDetailDemo] WITH CHECK

  ADD CONSTRAINT [FK_SalesOrderDetailDemo_SalesOrderHeaderDemo_SalesOrderID]

  FOREIGN KEY([SalesOrderID])

  REFERENCES [dbo].[SalesOrderHeaderDemo] ([SalesOrderID])

  ON DELETE CASCADE;

GO

 

Now that we have our primary keys and referential integrity based on our foreign key established with delete cascade, let’s go ahead and delete a single record. First, turn on the execution plan and also enable statistics IO, so we can see what happens behind the scenes of running this delete statement.

SET STATISTICS IO ON;

DELETE FROM dbo.SalesOrderHeaderDemo

WHERE SalesOrderID = 43659

 

Looking at the IO statistics you will see over a thousand reads against a table that wasn’t referenced inside the delete statement.  Remember adding cascade deletes on the foreign key to simplify the process of deleting child records?  At a first glance, this might make sense. We expect the statement to delete the child records in SalesOrderDetailDemo but how this is done is misleading mainly because we didn’t focus on data modeling as we built this solution.

 

Looking at the execution plan, you will notice a Clustered Index Scan operator.  This is causing us to scan across the whole child table (SalesOrderDetailDemo) to find child records to delete. If we did some data modeling to understand how these tables would be referenced and to understand the expectation for data growth we could have created indexes to cover how we would delete and select data properly between these two tables (SalesOrderHeaderDemo and SalesOrderDetailsDemo).

 

 

If we were to do some data modeling we would have seen that SalesOrderID column in the SalesOrderDetailDemo table would have been a great candidate for being indexed. In fact, being that this table is also accessed most frequently on joins to its parent table (SalesOrderHeaderDemo), it would be a much better candidate for your clustered index.

The following is how we should have created our indexes on the SalesOrderDetailDemo table after modeling how our data would be accessed through the parent-child relationship.

/* Your Primary Key doesn't have to be a clustered index but will by default */

ALTER TABLE [dbo].[SalesOrderDetailDemo]

ADD CONSTRAINT PK_SalesOrderDetailDemo_SalesOrderDetailID PRIMARY KEY NONCLUSTERED (SalesOrderDetailID);

  

ALTER TABLE [dbo].[SalesOrderDetailDemo] WITH CHECK

  ADD CONSTRAINT [FK_SalesOrderDetailDemo_SalesOrderHeaderDemo_SalesOrderID]

  FOREIGN KEY([SalesOrderID])

  REFERENCES [dbo].[SalesOrderHeaderDemo] ([SalesOrderID])

  ON DELETE CASCADE;

 

CREATE CLUSTERED INDEX idx_SalesOrderDetailDemo_SalesOrderID ON dbo.SalesOrderDetailDemo (SalesOrderID)

 

The following are the results for running through the same demo, except we are now utilizing our index changes from doing some data modeling. You will see that we were able to convert our index scan into an index seek which reduced logical reads from 1,501 to 29. 

 

 

For more data modeling tips for DBAs, watch the on-demand webinar as Karen Lopez and John Sterrett discuss 7 Dangerous Myths DBAs Believe about Data Modeling 


John Sterrett is a Database Administration Senior Advisor at Linchpin People with over eight years of experience with relational databases. His primary focus includes performance tuning, proactive monitoring, high availability and disaster recovery. He has presented at many community events, including PASS Member Summit, SQLRally, 24 Hours of PASS, SQLSaturday, and local PASS Chapter and Virtual Chapter meetings.



About
Gold User, Rank: 3, Points: 616
IDERA Product Marketing Manager for ER/Studio and DB PowerStudio Read data and database blogs on: http://community.embarcadero.com/index.php/blogs/blog-menu/categories/listings/database

Comments

  • Karen @DataChick
    Karen @DataChick Wednesday, 16 December 2015

    Another option for doing this is to use a macro in ER/Studio to create all the FK indexes automatically. Then only remove ones that we know for certain are unneeded (extremely rare). So Reverse Engineer the database, run the macro, then compare to the database which will generate an alter script with all those missing indexes.

  • Please login first in order for you to submit comments
  • Page :
  • 1

Check out more tips and tricks in this development video: