Why UniqueIdentifier is a Bad Choice for a Clustered Index in SQL Server

I had a question today about why it was “bad” to use a UniqueIdentifier as the data type for a clustered index in SQL Server. After all, on the surface that sounds like a good choice, since a UniqueIdentifier is guaranteed to be unique, right?  I have run into many developers over the years that are seduced by this idea…

Unfortunately, using a UniqueIdentifier is not a good choice from SQL Server’s perspective. UniqueIdentifier is a relatively large data type, so your table will be larger than if you can use a numeric data type such as Integer. Far worse is how quickly your clustered index will become fragmented as you INSERT even a relatively small number of rows into the table. This can hurt query performance for certain types of queries, and it makes your clustered index much larger.

In the example below, simply INSERTING 2000 rows of data into an empty table gives you over 99% fragmentation and nearly doubles the size of the table (compared to using an Integer for the clustered index).  Using a default value of NEWSEQUENTIALID() for the myID column alleviates the fragmentation issue, but you still have a clustered index that is about 30% larger than if you use an integer data type for the clustered index.

Note: The script below only works for SQL Server 2008 and above. There are only a couple of minor changes needed to make it work on SQL Server 2005.

-- Why UNIQUEIDENTIFIER is a bad choice for a clustered index data type
-- Glenn Berry 
-- March 2010
-- http://glennberrysqlperformance.spaces.live.com/
-- Twitter: GlennAlanBerry


-- Switch to Master
USE [master];
GO

-- Drop database if it exists
IF  EXISTS (SELECT name FROM sys.databases WHERE name = N'TestDB')
    BEGIN
        ALTER DATABASE [TestDB] SET  SINGLE_USER WITH ROLLBACK IMMEDIATE;
        DROP DATABASE [TestDB];
    END
GO


-- Create new database (change file paths if needed
CREATE DATABASE [TestDB] ON  PRIMARY 
( NAME = N'TestDB', FILENAME = N'C:\SQLData\TestDB.mdf' , 
  SIZE = 5MB , FILEGROWTH = 50MB )
 LOG ON 
( NAME = N'TestDB_log', FILENAME = N'C:\SQLLogs\TestDB.ldf' , 
  SIZE = 2MB , FILEGROWTH = 50MB)
GO


-- Switch to new database
USE TestDB;
GO


-- Do some things that will grow the database and fragment indexes

-- Create a "bad" table 
-- (never use a UNIQUEIDENTIFIER for your PK, clustered index!)
IF  EXISTS (SELECT * FROM sys.objects 
            WHERE object_id = OBJECT_ID(N'[dbo].[TestBadTable]') 
            AND type in (N'U'))
DROP TABLE [dbo].[TestBadTable];
GO
CREATE TABLE [dbo].[TestBadTable](
    [myID] [uniqueidentifier] NOT NULL,
    [myColumn] [varchar](2000) NULL,
 CONSTRAINT [PK_TestBadTable] PRIMARY KEY CLUSTERED 
(
    [myID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
 ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY];
GO



-- Insert some data to fragment the clustered table
SET NOCOUNT ON;
DECLARE @iteration INT = 0;

WHILE @iteration < 2000
BEGIN
    INSERT INTO dbo.TestBadTable(myID, myColumn)
    SELECT NEWID(), REPLICATE('A', 2000);
    
    SET @iteration = @iteration + 1;
END
GO
-- End of insert loop

-- Check Table size (7.9MB)
EXEC sp_spaceused TestBadTable, True;

-- Check how badly the clustered index is fragmented (99.3%)
SELECT avg_fragmentation_in_percent, page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'TestBadTable'), NULL , NULL, N'Limited') 
OPTION (MAXDOP 1);


-- Reorganize the clustered index (this is always an online operation)
ALTER INDEX [PK_TestBadTable] 
ON [dbo].[TestBadTable] REORGANIZE WITH ( LOB_COMPACTION = ON )
GO

-- Check how badly the clustered index is fragmented (2.4%)
SELECT avg_fragmentation_in_percent, page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'TestBadTable'), NULL , NULL, N'Limited') 
OPTION (MAXDOP 1);

-- Rebuild the clustered index (this can be done online with Enterprise Edition)
ALTER INDEX [PK_TestBadTable] ON [dbo].[TestBadTable] REBUILD PARTITION = ALL 
WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, 
ALLOW_PAGE_LOCKS  = ON, ONLINE = ON, SORT_IN_TEMPDB = OFF )
GO



-- Create another table using NEWSEQUENTIALID as a default
IF  EXISTS (SELECT * FROM sys.objects 
            WHERE object_id = OBJECT_ID(N'[dbo].[TestNotAsBadTable]') 
            AND type in (N'U'))
DROP TABLE [dbo].[TestNotAsBadTable];
GO
CREATE TABLE [dbo].[TestNotAsBadTable](
    [myID] [uniqueidentifier]  NOT NULL DEFAULT NEWSEQUENTIALID(),
    [myColumn] [varchar](2000) NULL,
 CONSTRAINT [PK_TestNotAsBadTable] PRIMARY KEY CLUSTERED 
(
    [myID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
 ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY];
GO


-- Insert some data to fragment the clustered table
SET NOCOUNT ON;
DECLARE @iteration INT = 0;

WHILE @iteration < 2000
BEGIN

    INSERT INTO dbo.TestNotAsBadTable(myColumn)
    SELECT REPLICATE('A', 2000);
    
    SET @iteration = @iteration + 1;
END
GO
-- End of insert loop

-- Check Table size (5.3MB)
EXEC sp_spaceused TestNotAsBadTable, True;

-- Check how badly the clustered index is fragmented (1%)
SELECT avg_fragmentation_in_percent, page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'TestNotAsBadTable'), NULL , NULL, N'Limited') 
OPTION (MAXDOP 1);


-- Reorganize the clustered index (this is always an online operation)
ALTER INDEX [PK_TestNotAsBadTable] 
ON [dbo].[TestNotAsBadTable] REORGANIZE WITH ( LOB_COMPACTION = ON )
GO

-- Check how badly the clustered index is fragmented (2.4%)
SELECT avg_fragmentation_in_percent, page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'TestNotAsBadTable'), NULL , NULL, N'Limited') 
OPTION (MAXDOP 1);

-- Rebuild the clustered index (this can be done online with Enterprise Edition)
ALTER INDEX [PK_TestNotAsBadTable] ON [dbo].[TestNotAsBadTable] REBUILD PARTITION = ALL 
WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, 
ALLOW_PAGE_LOCKS  = ON, ONLINE = ON, SORT_IN_TEMPDB = OFF )
GO


-- Create a much better table
IF  EXISTS (SELECT * FROM sys.objects 
            WHERE object_id = OBJECT_ID(N'[dbo].[TestGoodTable]') 
            AND type in (N'U'))
DROP TABLE [dbo].[TestGoodTable];
GO
CREATE TABLE dbo.TestGoodTable
(MyID INT IDENTITY(1,1) NOT NULL,
 MyColumn VARCHAR(2000) NULL,
 CONSTRAINT PK_TestGoodTable PRIMARY KEY CLUSTERED 
(MyID ASC)
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY];
GO


-- Insert some data to fragment the clustered table
SET NOCOUNT ON;
DECLARE @iteration INT = 0;

WHILE @iteration < 2000
BEGIN
    INSERT INTO dbo.TestGoodTable(MyColumn)
    SELECT REPLICATE('A', 2000);
    
    SET @iteration = @iteration + 1;
END
GO
-- End of insert loop

-- Check Table size (4MB)
EXEC sp_spaceused TestGoodTable, True;

-- Check how badly the clustered index is fragmented (less than 1%)
SELECT avg_fragmentation_in_percent, page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'TestGoodTable'), NULL , NULL, N'Limited') 
OPTION (MAXDOP 1);

-- Reorganize the clustered index (this is always an online operation)
ALTER INDEX [PK_TestGoodTable] 
ON [dbo].[TestGoodTable] REORGANIZE WITH ( LOB_COMPACTION = ON )
GO

-- Rebuild the clustered index (this can be done online with Enterprise Edition)
ALTER INDEX [PK_TestGoodTable] ON [dbo].[TestGoodTable] REBUILD PARTITION = ALL 
WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, ONLINE = ON, SORT_IN_TEMPDB = OFF )
GO

About these ads
This entry was posted in SQL Server 2008. Bookmark the permalink.

7 Responses to Why UniqueIdentifier is a Bad Choice for a Clustered Index in SQL Server

  1. Marc says:

    If the GUID / UNIQUEIDENTIFIER is supposed to be used in ANY index, you should really be using NEWSEQUENTIALID(), not NEWID(). That\’s T-SQL 102 level.

  2. Marc says:

    Your example of using NEWID() should really be using it as a DEFAULT clause to be comparable, also. I\’m getting 98.5% => 2.24% for NEWID(), 0.75% => 0.75% for NEWSEQUENTIALID() [e.g. no reorg gain] and 1.2% => 0% for IDENTITY [e.g. perfect reorg gain]. Now if you change the IDENTITY MyID column to a BIGINT to get a slightly more comparable keyspace (not really, but lets be generous), then the page_counts only change for the NEWID() case (989=>668), The NEWSEQUENTIALID() is unchanged at 667 and the IDENTITY is unchanged at 500 pages (with 32KB unused after reorg).

  3. Charles says:

    Agreed. Yet UniqueIdentifier is a wonderful choice for a primary key in so many ways. The fact that brain dead user interfaces used to create tables always make the primary key a clustered index leads to bad problems. In some instances there are not many GOOD choices for a clustered index in supporting data. This is especially true in Mobile development.

  4. Bart says:

    IMO it\’s really overstating the risks to say that uniqueidentifier is a "bad choice" for a clustered index. A couple of other comments to add to those of the other posters: (a) Clustering keys and primary keys are separate — just because you don\’t want a column to be the clustering key doesn\’t mean that it makes a poor choice for your PK. (b) Fragmentation is typically not the cause of the most important perf issues on most systems. If you have indexes that support seek-based plans for your perf-critical queries, fragmentation is more or less irrelevant from a perf perspective. In most cases it only has a noticeable effect when a plan must scan a large index, which is generally somthing you want to avoid even if the index isn\’t fragmented.

  5. Nick says:

    Thanks for those last two comments "No name" – not much use though, best of luck

  6. Jon says:

    It not bad as long as you have maintenance plans on that index.

    For zero maintenance and the smallest use of space, use an INTor BIG INT along with IDENTITY and the clustered index will be nice and sequential. If you want to use a GUID type of indentifier, just make sure there are maintenance plans on those indexes to keep the fragmentation at a minimum.

    We use tracking tables with INT and BIG INTs as our clustered PK. For meta data tables we use GUIDs, so no absolute answer here.

    Just keep in mind the issues with using guids (space and fragmentation) and you should be OK.

  7. Tarique says:

    nice post … thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s