Fun With Data Compression in SQL Server 2008

One of the new features in SQL Server 2008 that you can take advantage of (if you have Enterprise Edition) is data compression. Data compression allows you to save storage space and I/O at the cost of some extra CPU to compress and decompress the data. Depending on your workload and potential space savings, this may or may not be a good idea. If a table or index is accessed very frequently and if you are not saving that much space with compression, then you probably should not use data compression. If you gave a table that is used for auditing or logging, that would probably be a much better candidate for compression.

In SQL Server 2008, you can choose between PAGE compression and ROW compression.  ROW compression removes unneeded bytes from the column values by storing them in variable length format, while PAGE compression stores the repeating value only once per page and sets the pointer from the respective columns within the page. In most cases that I have seen, you will save quite a bit more space with PAGE compression than you will with ROW compression, but this is dependent on your data. Before you consider enabling compression, you should always run the sp_estimate_data_compression_savings system stored procedure to see how much space you may save using both PAGE and ROW compression. This stored procedure may take a few minutes to run, depending on your table size and hardware, since it samples the data in the table to come up with its estimate.

Here is how you use this stored procedure:

-- Check how much space you might save with PAGE data compression
EXEC sp_estimate_data_compression_savings 'schemaname', 'TableName', NULL, NULL, 'PAGE';

-- Check how much space you might save with ROW data compression
EXEC sp_estimate_data_compression_savings 'schemaname', 'TableName', NULL, NULL, 'ROW';
 

These queries let you focus on your user tables and indexes to look for likely candidates for data compression:

-- List all user tables and indexes
SELECT OBJECT_NAME(object_id) AS [ObjectName], [rows], data_compression_desc, index_id
FROM sys.partitions
WHERE LEFT(OBJECT_NAME(object_id),3) <> 'sys'    -- try to eliminate nonuser objects
AND LEFT(OBJECT_NAME(object_id),5) <> 'queue'
AND LEFT (OBJECT_NAME(object_id),10) <> 'filestream'
ORDER BY ObjectName;

-- List compressed tables and indexes
SELECT OBJECT_NAME(object_id) AS [ObjectName], [rows], data_compression_desc, index_id
FROM sys.partitions
WHERE data_compression > 0
ORDER BY ObjectName;

 

You can run this query to find which indexes have the most write activity. Tables or indexes that have lots of read or write activity are not as well suited for compression, especially PAGE compression.

-- Most active indexes and tables for writes
SELECT objectname = OBJECT_NAME(s.object_id), indexname = i.name, i.index_id,
        reads=range_scan_count + singleton_lookup_count,
        'leaf_writes' = leaf_insert_count+leaf_update_count+ leaf_delete_count, 
        'leaf_page_splits' = leaf_allocation_count,
        'nonleaf_writes' = nonleaf_insert_count + nonleaf_update_count + nonleaf_delete_count,
        'nonleaf_page_splits' = nonleaf_allocation_count
FROM sys.dm_db_index_operational_stats (db_id(),NULL,NULL,NULL) AS s
INNER JOIN sys.indexes AS i
ON i.object_id = s.object_id
WHERE OBJECTPROPERTY(s.object_id,'IsUserTable') = 1
AND i.index_id = s.index_id
ORDER BY leaf_writes DESC, nonleaf_writes DESC

 

Once you decide to compress the table, you can run the command below. This may take a long time depending on the size of the table and your I/O capacity. It is roughly equivalent to rebuilding the clustered index.

-- Compress the clustered index or heap
USE [yourdatabasename];
GO
ALTER TABLE [dbo].[PostLinks] REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = PAGE)

Non-clustered indexes have to be compressed separately. They do not inherit the clustered index’s compression setting. The command below will compress a non-clustered index:


-- Compress a non-clustered index with PAGE compression
USE [yourdatabasename];
GO

ALTER INDEX [IX_FeedLanguage_FeedID] ON [dbo].[FeedLanguage] 
REBUILD PARTITION = ALL WITH ( FILLFACTOR = 90, PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, IGNORE_DUP_KEY  = OFF, ONLINE = ON, SORT_IN_TEMPDB = OFF, DATA_COMPRESSION = PAGE )

 

This entry was posted in SQL Server 2008. Bookmark the permalink.

2 Responses to Fun With Data Compression in SQL Server 2008

  1. Mark says:

    Glenn,

    Doesn’t the data or page compression of a table have to be a scheduled job to ensure that additional rows of data to a table are compressed? It isn’t a set it and forget it option right?

    Thanks

    • Glenn Berry says:

      Nope. Once you compress an index in a table, all new or changed data will automatically use the compression setting for that index. All of the existing data is either compressed or decompressed when you run the ALTER INDEX REBUILD command to set compression.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s